Skip to content

plan: space feature end-to-end happy path — single space, single task#991

Merged
lsm merged 17 commits intodevfrom
plan/space-feature-end-to-end-happy-path-single-space-single-task
Mar 27, 2026
Merged

plan: space feature end-to-end happy path — single space, single task#991
lsm merged 17 commits intodevfrom
plan/space-feature-end-to-end-happy-path-single-space-single-task

Conversation

@lsm
Copy link
Copy Markdown
Owner

@lsm lsm commented Mar 26, 2026

Summary

  • 9-milestone plan (37 tasks) for end-to-end happy path: human converses with Space Agent, creates a task, workflow runs through Planning → Plan Review → Human Approval → Coding → 3 Parallel Code Reviewers → QA → Done
  • Core Channel + Gate architecture: channels are simple pipes, gates are independent filter entities with composable conditions (check, count, all, any)
  • Numeric task IDs (space-scoped, monotonically increasing) and space slugs for human-friendly navigation (neokai-dev#5)
  • Worktree isolation (one per task, task-title-derived slug names)
  • Live workflow canvas visualization with clickable approval gates and artifacts/diff view
  • Online integration tests and E2E Playwright tests

Plan files

  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/02-enhanced-node-agent-prompts.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/03-extended-coding-workflow.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/04-worktree-isolation.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/05-qa-agent-node.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/06-human-gate-canvas-ui.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/07-online-integration-test.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/08-e2e-test.md
  • docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/09-bug-fixes-and-hardening.md

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Plan Review: Space Feature End-to-End Happy Path

Overall this is a solid, well-structured 8-milestone plan with correct dependency ordering and appropriate scoping for a single-task, single-space happy path. However, there are several inconsistencies and gaps that need to be resolved before implementation.


P1: Workflow Structure Inconsistencies

Task 2.1 (M2) and the target workflow graph are misaligned.

The target workflow graph in 02-extended-coding-workflow.md shows:

Plan --[human]--> Code --[always]--> Review --[passed]--> Verify --[passed]--> Done

But Task 2.1 subtask 3 describes channels as:

"Review->Verify (task_result: passed), Verify->Plan (task_result: failed, cyclic)"

This Verify->Plan cyclic channel does not appear anywhere in the target graph. The graph shows no cyclic channels at all. Meanwhile, the QA workflow graph in 04-qa-agent-node.md shows QA -> Done and QA -> Code (failed), with QA replacing Verify entirely. This creates confusion:

  1. Does Verify -> Plan exist or not in M2?
  2. Is Verify a distinct node from QA, or does QA replace Verify?
  3. The overview says "Verify step is removed/replaced by QA" -- but if Verify doesn't exist in M2's target graph, this replacement is already baked in, making the statement redundant.

Recommendation: Clarify the exact 5-node graph for M2. If the goal is Plan->Code->Review->Verify->Done with no cyclic channels in M2 (and the reviewer feedback loop is purely via Review->Coder), then remove the Verify->Plan channel from the task description and ensure the M4 QA description clearly says it adds a QA node rather than "replacing" one that was never defined with a feedback loop.

The Reviewer->Coder feedback loop is underspecified in the target graph.

The target graph in 02-extended-coding-workflow.md shows Review --[passed]--> Verify but the surrounding text mentions "reviewer sends feedback to coder" on failure. The cyclic Review->Code channel is described in the task but does not appear as an arrow in the graph. Either add Review --[failed]--> Code to the graph, or clarify that the graph only shows the happy path and failure loops are documented in the task text (not shown in the diagram).


P1: Human Gate Mechanism Is Scattered Across Tasks

The human gate on Plan->Code is mentioned in multiple places but has no dedicated task.

  • 05-human-gate-and-completion.md Task 5.1 covers fixing the gate's pause/resume behavior
  • Task 2.1 seeds the workflow template with the human gate
  • Task 5.3 covers Space chat agent task creation from conversation

But the actual mechanism for how a human approves -- the UI component, the RPC call, the state transition -- is spread across these tasks without a clear single owner. Specifically:

  • How does the human see and click "Approve" in the Space chat UI?
  • Is there a dedicated task for building the approval button in the workflow view?

Recommendation: Add a clarifying note in Task 5.1 or Task 5.3 explicitly listing the components involved (UI approval widget, RPC handler for approval, state transition). Alternatively, split Task 5.1 into two tasks: one for the backend gate logic, one for the frontend approval UI.


P1: Verify vs QA Ambiguity

The plan states in M4 that "Verify step is removed/replaced by QA (QA subsumes the Verify role)." But:

  1. Verify already exists in CODING_WORKFLOW today
  2. The M2 extended workflow (Task 2.1) defines a Verify node with a task_result: failed -> Plan cyclic channel that has no QA counterpart
  3. M4 says QA replaces Verify but adds QA as a new node rather than modifying the existing Verify node

This is internally contradictory. If QA replaces Verify, then M2's workflow definition should either (a) not include Verify, or (b) include QA from the start.

Recommendation: Resolve this by M2. Either:

  • Option A: M2's CODING_WORKFLOW_V2 already has QA (not Verify) as the pre-completion node
  • Option B: M2's CODING_WORKFLOW_V2 has Verify, and M4 modifies it to be QA

P2: Iteration Cap Behavior Is Undefined

The plan mentions maxIterations in Task 2.1 but never defines:

  1. What happens when the cap is reached
  2. Whether the cap applies per-cycle (e.g., per Review->Code loop) or per entire run
  3. Whether the iteration count is per-channel or global

This matters for reliability: if a workflow hits the iteration cap, does it fail? Get stuck? Notify the human?

Recommendation: Add a section in 02-extended-coding-workflow.md or 08-bug-fixes-and-hardening.md explicitly defining the iteration cap behavior. Task 8.3 should include "iteration cap enforcement" as a hardening task.


P2: M8 Error Handling Is Too Vague for a Test-Driven Milestone

Task 8.3 describes adding "timeout enforcement," "cleanup on cancellation," and "structured error messages" but has no concrete acceptance criteria beyond "tests pass." This is a catch-all milestone that will likely balloon in scope.

Recommendation: Add a preliminary step in M8 (Task 8.0 or first subtask of 8.1) that creates a concrete bug triage doc listing every issue found in M6 and M7, with estimates. This gives M8 a concrete scope instead of being an undefined bucket.


P2: Task 6.2 Edge Cases Depend on M6.1, Creating a Long Tail

Task 6.2 (edge cases: iteration cap, concurrent tasks, cancellation, crash recovery) depends on Task 6.1. If 6.1 takes a long time, 6.2 is blocked. Some of these edge cases (cancellation, crash recovery) could be designed alongside the main pipeline in M5 rather than discovered in M6.

Recommendation: Consider whether iteration cap behavior and cleanup on cancellation should be specified in earlier milestones (M4/M5) rather than discovered in M6. Specifically, the iteration cap is mentioned in M2's task but its behavior is never specified.


P3: Plan Text Formatting Inconsistencies

  • 01-enhanced-node-agent-prompts.md and 04-qa-agent-node.md have the milestone header at the top of the file, but the file is named for that milestone. The other files have standard headers. Minor, but worth aligning.
  • The --[human] gate notation in the workflow graphs uses a different format than --[task_result gate: passed] which mixes shorthand with full names. Pick one notation and use it consistently.

Summary

The plan is well-structured with correct dependency ordering and appropriate scoping. The three critical issues to resolve before implementation are: (1) the workflow graph and task descriptions don't agree on the Verify node's connections, (2) the human gate approval mechanism needs a single owning task, and (3) the Verify/QA relationship needs to be disambiguated. These are scoping and clarity issues rather than missing work, and resolving them will prevent implementation confusion.

Recommendation: REQUEST_CHANGES (P1 issues need resolution)

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Plan Review: space feature end-to-end happy path

Overall this is a solid 8-milestone plan. The phased approach (stabilize existing workflow first, then extend, then test) is sound. However, I have several critical issues that need to be addressed before this plan is actionable.


P0 Issues (Blocking — must fix before implementation)

1. M2 Task 2.1: Verify -> Plan channel is conceptually wrong

The workflow diagram shows Verify -> Plan (on fail) — when verification fails, control returns to the Planner, not the Coder. This means a failed verification triggers re-planning rather than a fix attempt. The plan provides no rationale for this choice.

Problem: If verification fails (e.g., tests are broken), the human must review a new plan before the Coder can fix it. This adds unnecessary human involvement in what should be an automated fix loop.

Fix needed: Either (a) explain why this is the intended behavior, or (b) change the channel to Verify -> Code (on fail) so the Coder can fix directly without re-planning.


2. M5 Task 5.1: Human gate UX is completely absent

The task says "Human can approve via Space chat or RPC" but provides zero detail on:

  • What does the human SEE when the gate blocks? (A message? A UI indicator?)
  • What does the human DO to approve? (Type "approve"? Click a button? Click a button?)
  • What triggers the notification to the human that approval is needed?

The request_human_input tool is mentioned as the mechanism but is never defined anywhere in the plan. This is a critical gap — without it, the human gate cannot work.

Fix needed: Add explicit subtasks covering the human-facing UX: what the human sees, how they approve, and how that approval is communicated to the runtime (Space chat message parsing? RPC call? UI button?).


3. M3 Task 3.1: Worktree isolation is underspecified

The task says "Node agents need isolated worktrees" but the existing code in TaskAgentManager.createSubSession() passes workspacePath: space.workspacePath directly — no worktree isolation exists today. The CUSTOM_AGENT_FEATURES has worktree: false which appears to be a feature flag, not actual worktree creation.

Fix needed: Clarify what "isolated worktree" means here:

  • Does it mean creating a NEW git worktree for each node agent session?
  • Or does it mean something else (branch isolation within the same worktree)?
  • The Room system's WorktreeManager is mentioned as a potential reuse target, but there's no verification that it supports Space worktrees.

Subtask 4 ("Ensure the WorktreeManager can be reused for Space worktrees") should be expanded into a full investigation + implementation task, not a vague assertion.


P1 Issues (Should fix — important gaps)

4. M5 Task 5.3: Conversation -> task -> workflow flow is underspecified

The task says "verify the Space chat agent can parse a human request like 'implement user authentication' and call start_workflow_run" but doesn't explain:

  • How does the Space chat agent parse intent?
  • How does it select which workflow to run?
  • What happens if the request is ambiguous?

This is the entry point of the entire happy path. Without understanding this flow, it's impossible to verify the end-to-end path works.

Fix needed: Add concrete acceptance criteria describing how a human's message triggers a workflow run. The existing space-chat-agent.ts likely has relevant code — the plan should reference it and specify what changes are needed.


5. M4 Task 4.3: QA feedback to Coder — two parallel feedback paths?

The M4 workflow shows QA -> Code (failed) cyclic channel. But M2 already has Review -> Code (failed). This means there are TWO ways for feedback to reach the Coder:

  • QA can send directly to Coder (bypassing Review)
  • Review can send to Coder

The plan doesn't explain whether QA failure goes directly to Coder or through Review. If QA sends directly, the Coder might receive conflicting feedback from both Review and QA. If QA goes through Review, the flow is: Coder -> Review (pass) -> QA (fail) -> Review -> Coder — which adds latency.

Fix needed: Clarify the feedback topology. Is there a Review -> QA channel? Or does QA feedback go directly to Coder? The channel routing should be explicit.


6. M6 Task 6.1: Integration test is extremely high-risk

The test has 14 manual simulation steps (a-n). This is a brittle test that will be hard to debug when it fails. No strategy for test maintainability is mentioned.

Fix needed: Consider:

  • Breaking the test into smaller integration tests (one per milestone) rather than one massive end-to-end test
  • Using helper functions to encapsulate common simulation patterns (mock agent completion, verify node activation)
  • Adding explicit timeout handling for async operations

7. Dependency ordering: M3 blocks M4 unnecessarily

M4 (QA Agent) depends on M2 (extended workflow) and M3 (session factory). But the QA agent's existence doesn't logically require worktree isolation or feature flag configuration — those are runtime concerns for node agent execution. QA could be implemented in parallel with M3, not after it.

Fix needed: Re-examine the dependency graph. Consider moving M3 (or parts of it) to run in parallel with M4, or drop M3 -> M4 dependency if M3's work doesn't actually gate M4.


P2 Issues (Important suggestions)

8. M1 Task 1.3: Reviewer self-review prevention may be irrelevant

The task mentions "self-review prevention (EVENT=COMMENT when reviewing own PR)" as an acceptance criterion for the reviewer node agent. But in the Space workflow, the reviewer is a node AGENT, not a human developer. There's no risk of an agent reviewing its own PR because the agent is the one who created it.

This self-review prevention logic exists in the Room system's human review flow. For Space node agents, this check is unnecessary.

Suggestion: Remove the self-review prevention criterion from M1 Task 1.3, or explain why it's needed for automated agents.


9. M4 Task 4.2: QA agent needs GitHub authentication

The QA agent checks gh pr checks and gh pr view. These gh CLI commands require GitHub authentication. The plan doesn't address:

  • How does the QA agent authenticate with GitHub?
  • Does it use the same credentials as the Coder?
  • Are there cases where gh is not authenticated and the QA check fails for auth reasons rather than actual failures?

Suggestion: Add a subtask to verify gh authentication works in node agent sessions, or document how credentials are propagated.


10. Missing: What happens when maxIterations is exhausted?

The workflow has maxIterations: 3 to cap the Coder->Review loop. When this cap is hit, what happens? Does the run fail? Does it block forever? Does a human need to intervene?

Suggestion: Add a task or subtask covering the maxIterations exhaustion scenario. This is a significant failure mode that users will encounter.


11. Task 2.1: V2 workflow seeding strategy needs clarification

The task says "Update seedBuiltInWorkflows to also seed CODING_WORKFLOW_V2 alongside the existing workflows (additive, not replacing)". But if V2 is the better workflow, why keep V1? Will V1 be deprecated? Can users still create V1 workflows?

Suggestion: Clarify the migration strategy. Is V1 being kept for backward compatibility? Will existing spaces automatically get V2?


P3 Issues (Nits)

12. M1 Task 1.2: "injectWorkflowContext flag" — where is this defined?

The task says "Ensure the prompt works with the injectWorkflowContext flag". Looking at the existing custom-agent.ts, this flag exists on SpaceAgent and is used to inject workflow structure into the task message. The task should reference the existing implementation rather than implying it's a new feature.


13. The TODO in SpaceRuntime line 395-396 references "Milestone 6"

The existing code already has a TODO comment saying "Milestone 6: pass resolvedChannels to session group creation". This suggests either (a) M6 is partially pre-implemented or (b) the TODO is stale. The plan should clarify the current state of M6's work.


Summary

The plan has good structure and the phased approach is sound. However, three P0 issues must be resolved before implementation:

  1. Verify -> Plan channel routing — needs rationale or fix
  2. Human gate UX — completely absent from the plan
  3. Worktree isolation — underspecified and may require more than a "verify" task

The P1 issues (conversation->task flow, QA feedback topology, M6 test complexity, dependency ordering) are all addressable with more detail in the respective tasks.

Recommendation: REQUEST_CHANGES — the P0 issues need resolution before this plan is actionable.


P0: 3, P1: 4, P2: 4, P3: 2

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Plan Review: Iteration 2 — Space Feature End-to-End Happy Path

This iteration addressed most of the blocking issues from the previous review. The workflow graph is now internally consistent (no Verify→Plan channel), the human gate has dedicated frontend and backend tasks, the Verify/QA ambiguity is resolved, and the worktree isolation strategy is fully specified. The plan is significantly stronger than iteration 1.


P1: Task 5.4 Has an Unnecessarily Tight Dependency

Task 5.4 (Space chat agent task creation from conversation) currently depends on Task 5.2 (human gate frontend). The rationale given: "human gate UI must exist for the full entry-to-gate flow."

This is incorrect. The conversational entry point -- the Space chat agent parsing "implement user authentication" and starting a workflow -- is independent of the approval UI. What Task 5.4 actually needs:

  • Task 2.1 (V2 workflow template exists with the human gate channel defined)
  • Task 5.1 (approve/reject RPC handler exists)

The approval UI (Task 5.2) handles the human-facing experience of the gate itself -- displaying the "waiting for approval" message and parsing "approve"/"reject" chat messages. Task 5.4 is about the entry point -- how the human's initial task request triggers a workflow run.

This dependency matters because M6 and M7 both depend on M5.4, making this part of the critical path. If Task 5.2 slips, it should not block the conversation-to-task flow verification. The dependency should be: Task 5.2 or Task 5.1 (the RPC handler is the real minimum), not Task 5.2.

Fix needed: Change Task 5.4's dependency from Task 5.2 to Task 5.1 (the RPC backend). The conversation entry point can be verified once the RPC exists. Task 5.2 can still be done in parallel or after.


P2: Task 5.1 → Task 5.2 Notification Mechanism Is Fragile

Task 5.1 fires a human_gate_blocked notification event when a human gate blocks. Task 5.2 subscribes to workflow_run_status_changed events to detect waiting_for_approval status.

The plan does not address: what happens if the human approves before the frontend subscription is set up? The human types "approve" in chat, but if the subscription isn't active yet, the approval message might not trigger the correct UI behavior.

The fix is straightforward: the frontend should query the current workflow run status (from the DB/repo) on initial load, not rely solely on event subscription. This is a common pattern for event-driven UIs. Task 5.2 should add a subtask to verify the workflow run status on Space chat load when the run is in waiting_for_approval state.


P2: Task 4.3 QA Feedback Mechanism Needs Slightly More Detail

Task 4.3 states the QA feedback message (the ---QA_RESULT--- block) is included in the channel message to the Coder. However, the plan does not specify how the QA agent delivers its result to the channel router:

  • Does the QA agent call report_done with result: failed: <reason>, which triggers the QA→Code cyclic channel?
  • Or does it call send_message with the structured result?

M2's Reviewer-to-Coder feedback uses send_message (per Task 2.2). M4's QA→Code feedback uses channel routing (per Task 4.3). These are different mechanisms for conceptually the same thing. A brief note in Task 4.3 clarifying the mechanism would eliminate ambiguity.


P2: Task 3.1 SpaceWorktreeManager Investigation Is Unbounded

Task 3.1 says to "investigate" whether the Room WorktreeManager can be reused. The investigation task is not scoped -- there is no "punch out" decision point or criteria for when to give up on reuse and build fresh. If the investigation finds the Room WorktreeManager is not reusable, the task of creating a new manager could be substantial.

Suggestion: Add a subtask 0 to the investigation: "Produce a decision document: reusable (with modifications) or new implementation. Document the rationale." This gives implementers a clear stopping point.


P2: M4 QA→Code vs M2 Reviewer→Code Feedback Mechanisms Are Inconsistent

M2's Reviewer-to-Coder feedback uses send_message. M4's QA→Code feedback uses channel routing. The plan does not explain why they differ. If the distinction is intentional, it should be documented.


What Was Fixed Well

  • Verify→Plan channel: Completely removed from the graph. All cyclic channels route to Code.
  • Human gate UX completeness: Task 5.1 (backend) and Task 5.2 (frontend) are properly separated.
  • Worktree isolation: M3 now explicitly states no isolation exists today and specifies what needs to be built.
  • Verify/QA relationship: M2 removes Verify entirely. M4 adds QA as a new node. No ambiguity.
  • QA feedback topology: Explicitly documented in M4. QA failure goes directly to Code, but the full re-review cycle runs after the fix.
  • maxIterations exhaustion: Fully defined in M2 with explicit behavior.
  • Self-review prevention: Correctly removed from M1 Task 1.3 with rationale.
  • M3→M4 dependency: Correctly removed.
  • M6 test structure: Broken into focused sub-tests with shared helpers.
  • V2 seeding strategy: Fully documented in the overview.
  • Bug triage for M8: Task 8.1 creates a concrete bug list before M8 work begins.

Summary

All three P0 issues from the previous review are resolved. The remaining issues are P1/P2 scopedness concerns rather than missing work. The single most impactful fix is loosening Task 5.4's dependency from Task 5.2 to Task 5.1 -- it unblocks the critical path and correctly separates the entry point concern from the gate UI concern.

Recommendation: APPROVE with the above P1/P2 feedback to address before implementation.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Plan Review: Iteration 2 — Substantially Improved, Two P1s Remain

What Was Fixed From Iteration 1

All three P0 issues from the previous review have been addressed:

  1. Verify→Plan routing fix: Confirmed by reading packages/daemon/src/lib/space/workflows/built-in-workflows.ts lines 115-124. The old CODING_WORKFLOW has Verify & Test → Plan (task_result: failed, cyclic) — when verification fails, control returns to the Planner, not the Coder. M2 fixes this by removing Verify entirely and replacing it with Review → Code (failed, cyclic). This is the correct fix. ✓

  2. Human gate UX completeness: Task 5.1 (backend) and Task 5.2 (frontend) now have explicit subtasks covering what the human sees, the RPC handler, state transitions, and approval/rejection parsing. This is substantially improved. ✓

  3. Worktree isolation specification: The "Current State" section in M3 accurately diagnoses that workspacePath: space.workspacePath is passed directly with no worktree isolation, and worktree: false is a feature flag, not actual worktree creation. Task 3.1 now has concrete subtasks for implementation. ✓

The iteration cap behavior, M8 bug triage step, dependency reordering (M4 no longer blocked by M3), and QA feedback topology rationale are all well-specified. Good progress.


P1 Issues (Must Fix Before Implementation)

1. M4 Task 4.2: QA→Code channel definition is not explicit in the task description

The task adds three channels to the V2 workflow template but the QA→Code cyclic channel and its isCyclic: true flag are not called out explicitly. Meanwhile, Task 4.3 "Implement QA-to-Coder Feedback Loop" depends on this channel existing and specifically tests iteration counting on the QA→Code cycle (acceptance criterion: "Global iteration counter increments on both Review→Code and QA→Code cycles").

If the QA→Code channel is not explicitly defined with isCyclic: true in Task 4.2, the iteration-counting implementation will be ambiguous when Task 4.3 is implemented.

Fix: Add an explicit subtask in Task 4.2: "Add channel: QA → Code with isCyclic: true, task_result: failed gate, and description: 'Route QA failures back to Coder for fixes'".

2. M5 Task 5.1: Post-rejection recovery path is underspecified

The task says on rejection "The human can then provide feedback and restart, or create a new task." This single sentence leaves critical questions unanswered:

  • Can the human restart the same workflow run, or must they create a new task from scratch?
  • If they can restart, does the workflow resume from the Plan node or start fresh?
  • Is there a resume or restart RPC, or does the human simply approve again?
  • Does the iteration counter reset on restart?

The acceptance criteria only test rejection-to-failure, not recovery. This gap matters because the human gate UX must support the full lifecycle including recovery.

Fix: Add a subtask in Task 5.1 covering the recovery flow (restart/resume RPC, state transitions, iteration counter behavior), or add explicit acceptance criteria for the restart scenario.

3. M5 Task 5.4: Conversation-to-task entry point acceptance criteria are not verifiable

The task says "verify the Space chat agent can parse a human request and decide to create a task + start a workflow" but provides no concrete criteria for what "parse" means. The Space chat agent already has start_workflow_run and create_standalone_task MCP tools — what specifically needs to change?

Acceptance criteria like "Human can describe work in the Space chat and have it automatically routed to a workflow" are not testable without defining what happens when:

  • The request is ambiguous ("fix the thing" — which thing?)
  • No matching workflow is found
  • The human requests a non-coding task

Fix: Tighten the acceptance criteria to be concrete and verifiable, e.g., "When the human sends a message containing a coding task description (e.g., 'implement X'), the agent calls create_standalone_task followed by start_workflow_run with CODING_WORKFLOW_V2 within the same conversation turn."


P2 Issues (Important Suggestions)

4. M4 Task 4.3: QA→Review relationship after fix is implicit, not explicit

The spec says after QA fails and Coder fixes, "the full re-review cycle runs: Code → Review → QA → Done". This means after Coder's fix, Review must re-verify before QA re-checks. However, there is no explicit Review → QA channel defined — QA simply activates after Review reports passed.

This is correct behavior (Review is the gatekeeper for QA), but it is not made explicit. An implementer could incorrectly wire QA to activate immediately after Coder's report_done, skipping the Review re-check.

Fix: Add a clarifying note in Task 4.3: "QA activates only after Review reports passed on the re-review, not directly after Coder's report_done. The Review → QA → Done path is gated on Review's passed result."

5. M6 Task 6.1: Test helpers authored by the implementing coder

The task says "Verify all helpers work correctly with dev proxy" but the coder implementing the feature is writing the integration test helpers. For M6 (which validates the entire pipeline), consider whether test helper authorship should be separate from feature implementation to ensure independent validation. (This is a process suggestion, not a plan defect.)


What's Working Well

  • Verify→Plan routing correctly fixed (verified against existing built-in-workflows.ts)
  • Human gate UX well-specified with backend/frontend separation
  • Worktree isolation accurately diagnosed as requiring implementation, not just verification
  • Verify/QA relationship cleanly resolved
  • Iteration cap behavior well-specified with global counter and human notification
  • M8 bug triage step is the right approach for a test-driven milestone
  • Dependency ordering corrected (M4 no longer blocked by M3)

Summary

The plan is close to actionable. The three P0 issues from iteration 1 have all been substantively addressed. The remaining P1s are concrete gaps (QA channel definition, post-rejection recovery, conversation-to-task criteria) that could cause implementation confusion if not clarified first.

Recommendation: REQUEST_CHANGES


P0: 0, P1: 3, P2: 2, P3: 0

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Plan Review: Iteration 3 — All P0s Resolved, Minor P3 Remains


What Was Fixed From Iteration 2

All three P0 issues from iteration 2 have been substantively addressed:

  1. Task 5.4 dependency loosened to Task 5.1 — Confirmed. Task 5.4 now depends on Task 5.1 (backend RPC) and explicitly does NOT depend on Task 5.2 (frontend). This correctly separates the entry-point concern from the gate-UI concern. The dependency rationale is clear and the critical path is unblocked. ✓

  2. Task 4.2 QA→Code channel now explicit — Confirmed. Task 4.2 subtask 2 explicitly defines the QA→Code channel with isCyclic: true, task_result: failed gate, and description: "Route QA failures back to Coder for fixes". The iteration counter tracking is unambiguous. ✓

  3. Task 5.1 post-rejection recovery + restart RPC specified — Confirmed. The restart RPC is fully specified in subtask 7: validates failed status with humanRejected reason, transitions back to running, resets blocked node to pending, does NOT reset iteration counter. The acceptance criteria in subtask 8 cover this. ✓

  4. Task 5.4 acceptance criteria now verifiable — Confirmed. The five acceptance criteria in lines 175-181 are concrete and testable: clear coding task → agent calls create_standalone_task + start_workflow_run in same turn; ambiguous request → agent asks clarification, does NOT create task; workflow hits gate and transitions to waiting_for_approval. These can be verified with unit tests. ✓

  5. Task 5.2 event subscription robustness fix — Confirmed. Subtask 1 explicitly requires querying the current workflow run status from DB/repo on Space chat load, not just subscribing to events. This handles both the page-refresh case and the approval-before-subscription case. ✓

The P2 issues from iteration 2 are also addressed:

  • QA→Review activation ordering (iteration 2 P2 issue 4) — Line 116 in Task 4.3 now explicitly states: "QA activates only after Review reports passed on the re-review, not directly after Coder's report_done." ✓
  • M2/M4 feedback mechanism inconsistency (iteration 2 P2 issue 3) — Task 4.3 subtask 2 now explains the intentional distinction: send_message is for ad-hoc peer feedback, channel routing is for structured state-driven transitions. ✓

P3 Issue (Minor — Does Not Block)

Task 5.1: Feedback-before-restart message ambiguity

Lines 24-26 describe the feedback-before-restart flow: "If the human wants to provide feedback to the planner before restarting, they can type a message in the Space chat, which will be delivered to the node on restart."

This creates a potential edge case: between rejection (workflow goes to failed) and the human calling restart, the Space chat agent sees a message from the human while the workflow is in failed state. The Space chat agent might interpret this as a new task creation request rather than feedback for the blocked Plan node.

For example: human rejects, workflow becomes failed, human types "actually I want to change the approach" — does the Space chat agent:

  • Queue it as feedback for the Plan node on restart (correct behavior per spec)?
  • Treat it as a new create_standalone_task + start_workflow_run request (incorrect but plausible)?

The spec says feedback is "delivered to the node on restart" but does not specify whether the Space chat agent needs to detect that the workflow is in failed state with a pending restart and route the message accordingly. This is a minor edge case — in practice the human would likely call restart first then type feedback, or the feedback is brief ("restart") and the real content comes after. But an implementer could wire this incorrectly without a clarifying note.

Suggestion: Add a brief note in the feedback-before-restart section: "The Space chat agent must detect when the workflow is in failed status with humanRejected and route the human's next message to the blocked Plan node (not interpret it as a new task request), until restart is called."

This is P3 because the correct behavior IS described (feedback delivered on restart) and an implementer would likely get it right, but the routing logic for the interim period is ambiguous.


What's Working Well

  • All P0s from iteration 1 and iteration 2 are resolved
  • Workflow graph is internally consistent across all milestones
  • Human gate UX is properly split into backend (5.1) and frontend (5.2)
  • Post-rejection restart RPC is well-specified with iteration counter behavior explicit
  • Conversation-to-task entry point acceptance criteria are concrete and testable
  • QA feedback topology, iteration cap behavior, and M8 bug triage are all well-specified
  • Dependency ordering is correct (M4 does not depend on M3; 5.4 does not depend on 5.2)

Summary

The plan is actionable and ready for implementation. All blocking issues from iterations 1 and 2 have been resolved. The one remaining P3 is a minor edge case in the feedback-before-restart flow that does not affect the core functionality.

Recommendation: APPROVE


P0: 0, P1: 0, P2: 0, P3: 1

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Plan Review: Iteration 3 — Space Feature End-to-End Happy Path

All five focus areas from iteration 2 have been addressed. The plan is significantly clearer and more actionable than iteration 1. My recommendation is APPROVE with the P1/P2 feedback below.


What Was Fixed (Confirmed)

  1. Task 5.4 dependency loosened to Task 5.1: Correct. The conversation entry point only needs the RPC backend to exist. The dependency justification ("the conversation entry point needs the workflow to actually run and hit the gate to prove the full flow") is sound.

  2. Task 4.2 QA→Code channel now explicit: The isCyclic: true flag is now called out explicitly in subtask 2 and verified in subtask 2's "verify" step. Confirmed in the diff.

  3. Task 5.1 post-rejection recovery + restart RPC: The Post-rejection recovery section (new) defines both recovery paths clearly. Subtask 7 specifies the restart RPC with full state transition behavior. Subtask 8 adds the restart unit test. Acceptance criterion covers it.

  4. Task 5.4 acceptance criteria now verifiable: All five verifiable criteria are specific and testable. The "ambiguous request" and "no matching workflow" cases are now covered.

  5. Task 5.2 event subscription robustness: The initial DB query on Space chat load (subtask 1) + live event subscription (subtask 2) pattern is correct.


P1: Task 5.3 Depends on Task 4.2 But Acceptance Criteria Do Not Test QA

Task 5.3 (completion notification and summary) says its acceptance criteria include "QA verification status" but depends on Task 4.2 (which adds QA to the workflow). However, examining the acceptance criteria:

  • "QA verification status" is only mentioned as part of the summary message content
  • There is no test that verifies QA actually ran, or that QA's result was correctly captured in the summary
  • The task's own acceptance criteria are insufficient to verify the Task 4.2 dependency is actually exercised

Fix needed: Add an acceptance criterion in Task 5.3 that explicitly verifies the full QA→Done path in the summary, e.g., "The completion summary includes the QA agent's structured result (---QA_RESULT--- output), confirming QA ran and passed." Without this, Task 5.3 could pass by implementing a completion summary that ignores QA entirely.


P2: Task 5.3 Subtask 5 ("Update Task Agent Prompt") Is Vague

Subtask 5 says "Update the Task Agent prompt to produce a human-readable summary on completion" but provides no specifics:

  • What does the summary format look like?
  • What fields are required?
  • How does the Task Agent access QA's structured output?

Fix needed: Add a brief specification of the summary format in the subtask, e.g., "The summary includes: implemented changes, PR link/status, review summary, QA result (passed/failed with issues), next steps." Without this, the implementer will need to make design decisions that belong in the plan.


P2: Task 8.4 Node Timeout Implementation Path Is Unclear

Task 8.4 specifies per-node timeouts (30 min Coder, 15 min Reviewer/QA, 20 min Planner) as a hardening item. However:

  • Task 2.1 (workflow template) only defines maxIterations on the workflow run — no timeout field is specified
  • Task 8.4 (hardening) adds timeouts but the workflow template created in Task 2.1 would need to support node-level timeout fields
  • The plan doesn't specify whether timeouts are added to the workflow template schema in Task 2.1 or retroactively in Task 8.4

Fix needed: Either (a) add a timeoutMinutes field to the node definition in Task 2.1, or (b) specify in Task 8.4 that node timeouts are enforced by the runtime (TaskAgentManager) using hardcoded defaults, not stored in the workflow template. The latter is simpler and doesn't require a schema change.


P2: Task 4.3 Subtask 2 "Channel Message Payload" Is Ambiguous

Task 4.3 Subtask 2 says "QA agent's ---QA_RESULT--- structured output is included in the channel message payload." The mechanism is report_done with result: 'failed: <reason>', which triggers channel routing. But the phrase "channel message payload" is ambiguous — it could mean the channel routing's message wrapper, not the QA agent's actual content.

Clarification needed: Confirm that ---QA_RESULT--- is the content that flows through the QA→Code channel to the Coder, and that the Coder receives it as a workflow message (not as a peer-to-peer send_message). The M4 graph shows QA→Code as a channel arrow, not a send_message arrow, which is correct — but the subtask text should be unambiguous.


P3: Task 5.4 Ambiguous Request Criterion — Parsing Not Defined

The criterion "When the human sends an ambiguous request ('fix the thing'), the agent responds with a clarification question and does NOT create a task" is verifiable but the plan doesn't specify what constitutes an "ambiguous" request. This is left entirely to the agent's judgment.

Suggestion: Add a brief note that intent disambiguation uses the agent's own judgment (no special parsing logic required), and that this is testable by sending a vague message and verifying no task is created. This sets proper expectations for the implementer.


Summary of Iteration 2 Feedback Status

Issue Status
Task 5.4 dependency loosened (5.2 → 5.1) Fixed
Task 5.2 event subscription robustness Fixed
Task 4.3 QA feedback mechanism detail Fixed (mechanism note added)
Task 3.1 investigation bounded Fixed (decision criteria added)
M2 vs M4 feedback mechanism inconsistency Documented (intentional distinction)
Task 5.1 post-rejection recovery Fixed (restart RPC specified)
Task 5.4 acceptance criteria tightened Fixed
Task 5.3 QA dependency mismatch Still present (P1)
Task 5.3 subtask 5 vagueness Still present (P2)
Task 8.4 timeout implementation path Still present (P2)

Recommendation: APPROVE — the plan is actionable and all P0/P1 issues from iterations 1 and 2 are resolved. The remaining P1/P2 issues are scope refinements that do not block implementation.


P0: 0, P1: 1, P2: 3, P3: 1

@lsm
Copy link
Copy Markdown
Owner Author

lsm commented Mar 26, 2026

CI Failure Investigation

Confirmed: pre-existing issue on dev, not caused by this plan PR.

This PR contains only docs/plans/ changes (9 markdown files, 0 code changes). Verified:

  • git diff dev..HEAD --name-only → only docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/ files
  • git status → clean working tree

The failing job (Daemon + Shared Unit Tests (Coverage)) fails due to:

SyntaxError: Export named 'createSdkMcpServer' not found in module 
'@anthropic-ai/claude-agent-sdk@0.2.81'

This is an SDK version mismatch in the dev branch — the code imports createSdkMcpServer but the installed SDK version (0.2.81) doesn't export it. The unit test job was skipped on the latest dev push but ran and failed on this PR (because the CI matrix is different for PR-to-dev vs push-to-dev). This failure would affect any PR targeting dev.

@lsm
Copy link
Copy Markdown
Owner Author

lsm commented Mar 26, 2026

CI Fix PR

Fix PR #992 has been opened targeting dev: fix/sdk-mock-module-isolation.

Root cause: Bun's module isolation between test files invalidates the mock.module() from setup.ts. The test file room-runtime-service-worker-mcp.test.ts uses dynamic await import(...) which resolves the real SDK before the preload mock is re-applied on CI (Ubuntu).

Fix: Re-declare the SDK mock at the top of the test file. Single file change.

Once #992 merges to dev, this plan PR should be rebased and CI should pass.

@lsm lsm force-pushed the plan/space-feature-end-to-end-happy-path-single-space-single-task branch 2 times, most recently from 8253ea9 to 929eed8 Compare March 26, 2026 21:19
lsm added 7 commits March 26, 2026 20:50
…, single workflow

8-milestone plan to make the space workflow happy path work end-to-end:
conversation -> task creation -> workflow run (Plan -> Code -> Review -> QA -> Done)
with human gates, agent-to-agent messaging, and completion detection.
P0 fixes:
- Remove Verify node entirely from V2 workflow; QA replaces Verify from M4
- All cyclic channels route to Code, never to Plan (fixes re-planning on fix)
- Add full human gate UX specification (what human sees, how to approve, RPC)
- Split human gate into backend task (5.1) and frontend task (5.2)
- Expand worktree isolation from "verify" to full investigate+implement

P1 fixes:
- Clarify QA feedback topology (QA→Code direct, full re-review cycle after fix)
- Fix dependency ordering (M4 no longer depends on M3)
- Break M6 monolithic test into 5 focused sub-tests with shared helpers
- Add conversation-to-task flow detail referencing space-chat-agent.ts

P2 fixes:
- Remove self-review prevention from reviewer prompt (not needed for agents)
- Add gh auth verification to QA prompt and session factory
- Define maxIterations exhaustion behavior (failed + human notification)
- Clarify V2 seeding strategy (additive, tagged 'default')
- Add bug triage task (8.1) to scope M8 hardening work
P1 fixes:
- Task 5.4 dependency loosened: Task 5.2 → Task 5.1 (RPC backend is
  the real minimum; frontend UI is not on the critical path)
- Task 4.2: QA→Code channel now explicitly specifies isCyclic: true
- Task 5.1: Post-rejection recovery path fully specified with
  spaceWorkflowRun.restart RPC handler, state transitions, and
  iteration counter preservation
- Task 5.4: Acceptance criteria tightened to be verifiable with
  specific agent call sequences and edge cases

P2 fixes:
- Task 5.2: Initial status query on Space chat load (not just event
  subscription) prevents missed gate notifications
- Task 4.3: Mechanism consistency note — send_message vs channel
  routing distinction is intentional and documented
- Task 4.3: QA→Review relationship made explicit — QA only activates
  after Review reports passed, not directly after Coder
- Task 3.1: WorktreeManager investigation bounded with decision
  criteria and two clear outcomes
- Fix task count from 19 to 28 (actual count across 8 milestones)
- Add WorkflowRunStatus type expansion subtask to M5 Task 5.1
  (adds 'waiting_for_approval' and 'failed' to the existing type)
- Add failureReason field to SpaceWorkflowRun for structured errors
- Add daemon restart rehydration subtask for waiting_for_approval
- Use 'needs_attention' with error metadata for M2 iteration cap
  (type expansion deferred to M5)
- Fix QA worktree git state: use git worktree add with commit SHA
  for detached-HEAD worktrees (PR head may not be on a branch)
- Make QA test command detection actionable: probe package.json
  scripts, Makefile, then fallback to bun test/vitest/npm test
- Add clarifying note: worktree feature flag controls UI only,
  not actual worktree creation (Task 3.2)
- Add placement rationale for Task 5.4: requires full pipeline
  with human gate (M5), not M1/M2
- Add V1-to-V2 migration story: out of scope, V1 retained
- Fix agent type labels: Task 1.2 planner, Task 1.3 reviewer
- Update all 'running'/'failed' status references to use actual
  WorkflowRunStatus values ('in_progress'/'needs_attention'/'failed')
- Confirm buildCoderSystemPrompt() exists (reviewer was incorrect)
Overview now states the final intended status ('failed' with
failureReason) while noting the interim workaround
('needs_attention') used until M5 lands.
Major revision addressing feedback:
- Replace complex state machine with Gate + Channel model (gates have
  persistent data stores, agents read/write via MCP tools)
- One worktree per task with short human-readable names (alpha-3, nova-7)
- Full pipeline: Planning → PR Gate → Plan Review → Human Gate → Coding →
  PR Gate → 3 Parallel Reviewers → Aggregate Gate (3 yes) → QA → Done
- Canvas UI for workflow visualization with clickable human gates showing
  artifacts view and file diffs
- Restructured from 8 to 9 milestones with new M1 (gate data store)
@lsm lsm force-pushed the plan/space-feature-end-to-end-happy-path-single-space-single-task branch from 2d9e878 to f386ec2 Compare March 27, 2026 00:57
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Summary

This is a thorough, well-structured 9-milestone plan that represents a significant improvement over the previous revision. The gate + channel architecture is a sound simplification over a complex state machine, and the plan addresses most of the feedback from previous review rounds. However, there are several remaining issues that should be resolved before implementation begins.

Recommendation: REQUEST_CHANGES (1 P1, 4 P2 issues)


P1 Issues (Should Fix)

1. WorkflowRunStatus does not include 'failed' -- plan assumes it exists

The plan references a 'failed' status for WorkflowRunStatus in multiple places:

  • 00-overview.md line 188: failed with failureReason: 'maxIterationsReached'
  • 06-human-gate-canvas-ui.md Task 6.1 subtask 5: failureReason: 'humanRejected'
  • 06-human-gate-canvas-ui.md Task 6.1 subtask 6: "Add 'failed' to the type"
  • 09-bug-fixes-and-hardening.md Task 9.4 subtask 1: "transitions to failed"

However, the current WorkflowRunStatus type in packages/shared/src/types/space.ts (line 304) is:

export type WorkflowRunStatus =
  | 'pending'
  | 'in_progress'
  | 'completed'
  | 'cancelled'
  | 'needs_attention';

There is no 'failed' status. The existing status machine in workflow-run-status-machine.ts maps failure scenarios to 'needs_attention' (see the lifecycle comment: "agent failed or gate blocked requiring human action"). Adding 'failed' is a significant type-system change that affects the status machine, repository, RPC handlers, and every consumer of WorkflowRunStatus.

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md line 188
File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/06-human-gate-canvas-ui.md Task 6.1 subtask 6

Action needed: Either:

  • (a) Add a dedicated task in M1 to extend WorkflowRunStatus with 'failed' and update the status machine (preferred -- this is a cross-cutting change that should be called out explicitly), or
  • (b) Map failure scenarios to the existing 'needs_attention' status with a failureReason field, and remove all references to 'failed' throughout the plan.

P2 Issues (Important Suggestions)

2. Gate data store persistence strategy is underspecified

Task 1.1 subtask 5 mentions two options: "JSON column on the channel/gate record, or a separate gate_data table keyed by runId + gateId" but never picks one. This is an architectural decision that affects:

  • Query patterns (do you need to read gate data independently of the channel definition?)
  • Concurrency (SQLite JSON column updates vs. row-level locks)
  • Schema migration complexity

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md Task 1.1 subtask 5

Action needed: Pick one approach and state the rationale. A separate gate_data table seems more appropriate because: (a) gate data changes frequently during a run while channel definitions are static, (b) gate data is per-run while channels are per-workflow, and (c) a separate table allows atomic reads/writes without deserializing a JSON blob from the channel definition.

3. Shared Aggregate Gate semantics need clarification

The plan states that three reviewer channels share a single Aggregate Gate:

Reviewer 1 -> QA [Aggregate Gate: quorum=3]
Reviewer 2 -> QA [Aggregate Gate: quorum=3]
Reviewer 3 -> QA [Aggregate Gate: quorum=3]

And 03-extended-coding-workflow.md Task 3.3 subtask 4 says: "The Aggregate Gate evaluates after each write -- only activates QA when quorum is met."

But the channel definition table (line 42) also shows a separate cyclic channel:

Reviewer -> Coding [Aggregate Gate: on reject, cyclic]

This creates ambiguity: when a reviewer rejects, do they write { approve: 'reject' } to the same shared Aggregate Gate data, or to a different gate? If it's the same gate, then a single reject vote permanently pollutes the gate data for all three channels. If it's a different gate, the plan needs to define how many gates exist and their relationships.

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/03-extended-coding-workflow.md channel definition table (line 42)

Action needed: Explicitly define whether the reject feedback uses the same gate data store as the approve-quorum gate, or a separate gate. If separate, update the channel table to show two distinct gates with different IDs.

4. Task 4.2 subtask 1 is open-ended investigation, not a bounded task

"Decide: reuse (with modifications) or new implementation" with criteria "if reusing requires >3 non-trivial modifications" is not a concrete acceptance criterion. The existing WorktreeManager in packages/daemon/src/lib/worktree-manager.ts is Room-specific (uses sessionId, room-oriented metadata). The plan's SpaceWorktreeManager has fundamentally different semantics (one per task, shared by all agents, short human-readable names, tied to a SpaceTask not a Session).

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/04-worktree-isolation.md Task 4.2 subtask 1

Action needed: Just commit to creating a new SpaceWorktreeManager in packages/daemon/src/lib/space/. The existing WorktreeManager uses simple-git and has room-specific abstractions (sessionId, WorktreeMetadata). The Space system needs task-scoped worktrees with different lifecycle management. Reuse low-level git operations via simple-git directly (same dependency), but don't try to extend the Room's WorktreeManager class.

5. Node agent prompt specialization (M2) should happen AFTER V2 workflow template (M3), not before

M2 depends on M1 (gate MCP tools must exist), but M2 also implicitly depends on M3 because the prompts reference specific gate IDs and channel names from the V2 workflow. For example:

  • Task 2.1 says "the coder must call write_gate to write PR data to the Code PR Gate" -- but the Code PR Gate only exists in the V2 workflow template defined in M3.
  • Task 2.3 says "reviewer writes vote to the Aggregate Gate" -- same issue.

If M2 is implemented first, the prompt builders would need to reference gate IDs that don't exist yet. In practice, the implementer would need to coordinate across milestones anyway.

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/02-enhanced-node-agent-prompts.md

Action needed: Either reorder M2 after M3, or add a note that M2 prompts use generic gate references (e.g., "the downstream gate") rather than specific gate names, with M3 providing the concrete wiring.


P3 Issues (Nits)

6. Plan file numbering mismatch with PR description

The PR description lists 01-enhanced-node-agent-prompts.md through 08-bug-fixes-and-hardening.md (8 files, matching the old 8-milestone structure), but the actual PR contains 01-gate-data-store.md through 09-bug-fixes-and-hardening.md (9 files matching the revised structure). The PR description file list is stale.

Action needed: Update the PR description to match the actual file list.

7. Iteration cap on QA re-review cycle could be clearer

Task 5.1 subtask 4 says "full re-review cycle: Coding -> 3 Reviewers -> QA (all 3 reviewers must re-vote)" after a QA failure. But do the reviewers' previous votes get cleared when QA fails and sends the Coder back? If not, the Aggregate Gate might already have 3 approve votes from the previous round, and the re-review would immediately pass without the reviewers actually re-reviewing.

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/05-qa-agent-node.md Task 5.1 subtask 4

Action needed: Add a note that gate data (specifically the Aggregate Gate votes) must be reset when the Coder re-activates via a cyclic channel. Otherwise, stale approve votes from a previous round could short-circuit the re-review.


What's Done Well

  • The gate + channel model is a genuine simplification. Gates as self-evaluating conditions with persistent data stores is cleaner than a workflow-wide state machine.
  • The cyclic channel routing design (all feedback loops route back to Coding, never to Planning) is well-thought-out and prevents unnecessary re-planning.
  • The worktree isolation strategy (one per task, short human-readable names) is pragmatic and avoids over-engineering.
  • The iteration cap is correctly scoped as a global counter per workflow run, not per-channel.
  • The phased approach (Phase 1: architecture, Phase 2: UI + QA, Phase 3: testing) has correct dependency ordering (with the M2/M3 caveat noted above).
  • Milestone 9 (bug triage before fixing) is a good pattern that prevents scope creep.

- The human gate only fires once (Plan Review → Coding), not on every iteration
- The Coder can iterate on feedback from both reviewers and QA independently

**Iteration cap**: `maxIterations` is a global counter on the workflow run, incremented each time ANY cyclic channel is traversed. When the cap is reached, the workflow transitions to `failed` with a `failureReason` of `'maxIterationsReached'`.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: WorkflowRunStatus does not include 'failed'

The plan references failed as a WorkflowRunStatus value here and in multiple other milestones (M6 Task 6.1, M9 Task 9.4). However, the current type in packages/shared/src/types/space.ts:304 only has:

| 'pending' | 'in_progress' | 'completed' | 'cancelled' | 'needs_attention'

The existing status machine maps failure scenarios to 'needs_attention'. Adding 'failed' is a cross-cutting type-system change that needs an explicit task, not just a subtask buried in M6.

Action: Add a dedicated task in M1 to extend WorkflowRunStatus with 'failed' and 'failureReason', and update workflow-run-status-machine.ts accordingly. All downstream milestones reference this status.

2. Add new gate types: `pr` and `aggregate`
3. Add `GateData` type and `data: GateData` field to the gate/channel interface
4. Add `GateConfig` discriminated union with per-type config (e.g., `quorum` for aggregate)
5. Update the SQLite schema for workflow runs to persist gate data (JSON column on the channel/gate record, or a separate `gate_data` table keyed by `runId + gateId`)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Persistence strategy is not decided

This subtask lists two options (JSON column vs. separate table) but never picks one. This is an architectural decision with downstream effects on query patterns, concurrency, and migration complexity.

A separate gate_data table (keyed by runId + gateId) seems more appropriate because gate data changes frequently during a run while channel definitions are static, gate data is per-run while channels are per-workflow, and a separate table allows atomic reads/writes.

Action: Pick one approach and state the rationale here.

| Reviewer 3 → QA | Aggregate | `{ quorum: 3 }` | no | Same aggregate gate (shared) |
| QA → Done | Task Result | `{ expression: 'passed' }` | no | QA passes |
| QA → Coding | Task Result | `{ expression: 'failed' }` | yes | QA fails, feedback to coder |
| Reviewer → Coding | Aggregate | on reject | yes | Any reviewer rejects, feedback to coder |
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Shared Aggregate Gate ambiguity

This channel definition says "Reviewer -> Coding [Aggregate Gate: on reject, cyclic]" but the three Reviewer->QA channels above all share the same Aggregate Gate (quorum=3). When a reviewer rejects:

  1. Do they write to the same Aggregate Gate data store as the approve votes? If so, a single reject pollutes the shared gate and the quorum logic breaks.
  2. Or is there a separate gate for the reject channel? If so, the plan needs to show two distinct gate instances.

Additionally: when the Coder re-activates after a QA failure (M5 Task 5.1 subtask 4 says "full re-review cycle"), do the Aggregate Gate votes get reset? If not, stale approve votes from the previous round could cause the gate to pass immediately without actual re-review.

Action: Define whether reject feedback uses a separate gate or the same gate. Add a note about vote reset on cyclic re-activation.

**Description**: Create a `SpaceWorktreeManager` that manages git worktrees for Space tasks. One worktree per task, created from the space's repository.

**Subtasks**:
1. **Investigate existing WorktreeManager**: Read `packages/daemon/src/lib/room/managers/worktree-manager.ts` and decide: reuse (with modifications) or new implementation.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Open-ended investigation subtask

Subtask 1 says "Decide: reuse (with modifications) or new implementation" with criteria ">3 non-trivial modifications." The existing WorktreeManager in packages/daemon/src/lib/worktree-manager.ts is Room-specific (uses sessionId, room-oriented metadata, WorktreeMetadata type). The Space system needs fundamentally different semantics (task-scoped, short human-readable names, shared by multiple agents).

Action: Just commit to creating a new SpaceWorktreeManager in packages/daemon/src/lib/space/. Reuse simple-git (same dependency) for low-level git operations, but don't try to extend the Room's class. Remove the investigation subtask and replace with a direct implementation task.

2. Add bypass markers section (RESEARCH_ONLY, VERIFICATION_COMPLETE, etc.) for role 'coder'
3. Add review feedback handling: how to fetch GitHub reviews, verify feedback, push fixes
4. Add PR creation flow with duplicate prevention (`gh pr list --head`)
5. **Add gate interaction instructions**: After creating a PR, the coder must call `write_gate` to write PR data (`{ prUrl, prNumber, branch }`) to the Code PR Gate. This unblocks the reviewer channel.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Implicit dependency on M3

Task 2.1 references "the Code PR Gate" and Task 2.3 references "the Aggregate Gate" -- but these gates only exist in the V2 workflow template defined in M3 (Task 3.1). If M2 is implemented first, the prompt builders would reference gate IDs that don't exist yet.

Action: Either reorder M2 after M3, or add a note that M2 prompts use generic references (e.g., "the downstream gate for this step") rather than specific V2 gate names. M3 would then provide the concrete wiring.

@@ -0,0 +1,154 @@
# Milestone 1: Gate Data Store and New Gate Types
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Stale PR description file list

The PR description lists files 01-enhanced-node-agent-prompts.md through 08-bug-fixes-and-hardening.md (old 8-milestone structure), but the actual PR contains 01-gate-data-store.md through 09-bug-fixes-and-hardening.md (revised 9-milestone structure). The file list needs updating.

1. Verify QA node exists in V2 template with correct agent assignment and tool access
2. Verify channels: Aggregate→QA (passes on quorum), QA→Done (passes on 'passed'), QA→Coding (fails on 'failed', cyclic)
3. Test the QA→Coding feedback loop: QA writes `{ result: 'failed', summary: '...' }` to Task Result Gate → cyclic channel activates → Coding node re-activates
4. Verify the full re-review cycle after QA failure: Coding → 3 Reviewers → QA (all 3 reviewers must re-vote)
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P3: Aggregate Gate vote reset on cyclic re-activation

When QA fails and sends the Coder back (cyclic channel), then the Coder fixes and the 3 reviewers re-review -- do the previous approve votes in the Aggregate Gate get cleared? If not, the gate might already have 3 approve votes from the previous round and would immediately pass without the reviewers actually re-reviewing the new code.

Action: Add a note that gate data must be reset (votes cleared) when a cyclic channel fires, to prevent stale votes from short-circuiting re-review.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

This is a comprehensive plan for the Space feature end-to-end happy path. The gate+channel architecture is well-designed and simpler than the state machine approach it replaces. Below is my detailed review organized by milestone.


M1: Gate Data Store and New Gate Types

Strengths

  • Clear separation of concerns: gates have data stores, evaluators are simple pure functions
  • MCP tools (read_gate, write_gate) are the right abstraction for agent-to-gate interaction
  • Permission model is well-scoped (role-based write permissions)

Issues

P1: Task 1.3 permission model is underspecified

The permission model says "which agent roles can write to which gate types" but gates are identified by gateId, not type. The plan needs to clarify:

  • How is the "writer" agent identified at write_gate call time? By the myRole in the MCP server config?
  • What happens when an unauthorized agent attempts to write? Return error or silently fail?
  • How are gate IDs mapped to allowed writer roles?

Suggestion: Add a subtask to define the gate_permissions schema (gateId -> allowedWriterRoles) and the authorization check in write_gate.

P1: Gate ID resolution is unclear

Task 1.3 mentions read_gate({ gateId }) but does not explain how agents discover gate IDs. The V2 workflow has multiple gates (Plan PR Gate, Code PR Gate, Aggregate Gate) - how does an agent know which ID to use?

Suggestion: Add a list_gates MCP tool or inject gate IDs into the agent's task message via workflow context. The overview mentions this implicitly but it should be explicit in M1 or M2.

P2: Aggregate Gate vote key collision

The Aggregate Gate design uses { votes: { [agentId]: 'approve' | 'reject' } }. If the same reviewer agent spawns twice (e.g., after a crash and re-spawn), it could overwrite its previous vote. Consider using reviewer_node_id or a unique vote ID instead of agent ID.


M2: Enhanced Node Agent Prompts

Strengths

  • Leveraging existing Room system prompts is the right approach
  • Clear breakdown of the each role needs (coder, planner, reviewer, QA)

Issues

P1: Missing list_gates / gate discovery in prompts

Task 2.1 subtask 6 says "Add gate interaction instructions" but does not mention how agents discover gate IDs. The prompts should include:

  • How to enumerate available gates (e.g., via a list_gates tool or from workflow context)
  • Which gate IDs correspond to which workflow stage (Plan PR Gate vs Code PR Gate)

P2: Reviewer parallel voting edge cases

Task 2.3 says "When 3 reviewers all write 'approve', the Aggregate Gate passes" but does not address:

  • What if a reviewer is killed mid-review? Does the vote persist?
  • What if 2 approve and 1 rejects before the 3rd votes?

Suggestion: Add guidance for reviewers to check current vote state before voting (via read_gate on the Aggregate Gate).


M3: Extended Coding Workflow (V2)

Strengths

  • Clear node/channel topology with good cycle handling
  • All cyclic channels route to Coding (not Planning) - correct design
  • maxIterations: 5 is reasonable for this pipeline length

Issues

P1: Aggregate Gate quorum reset on cyclic traversal

When a reviewer rejects and the workflow cycles back to Coding, the Aggregate Gate's votes should be reset. The plan does not specify this. After the coder fixes, all 3 reviewers must re-vote from scratch. Task 1.4 mentions "incremental votes" but not reset behavior.

Suggestion: Add explicit subtask in M3 to clear Aggregate Gate votes when entering a cyclic channel, or specify that the gate's evaluate() checks only votes from the current iteration.

P2: Shared PR Gate across 3 channels

The channel table shows Coding -> Reviewer 1/2/3 all share the same PR Gate. This is mentioned in Task 1.4 but should be explicit that the same gateId is used for all 3 reviewer channels.


M4: Worktree Isolation

Strengths

  • One worktree per task (not per agent) is the right scope
  • Short human-readable names (alpha-3, nova-7) improve UX
  • Investigating existing WorktreeManager before creating new one is good engineering

Issues

P1: Worktree cleanup timing is underspecified

Task 4.3 says "On workflow run completion (Done node), call removeTaskWorktree()" but:

  • What if the PR has not been merged yet? The worktree is still needed for follow-up commits.
  • What if the workflow is in the "Done" state but human wants to review the code locally?

Suggestion: Consider keeping the worktree until the PR is merged or explicitly deleted by the human. Add a keepWorktree flag or defer cleanup to a TTL-based reaper.

P2: Branch naming includes taskId

Task 4.2 says branch naming is space/{taskId}/{worktree-name}. Task IDs are UUIDs, making branch names very long. Consider space/{worktree-name} since worktree names are already unique.


M5: QA Agent Node

Strengths

  • Clear feedback topology (QA -> Coding, not QA -> Review)
  • Completion flow is well-defined

Issues

P1: QA re-review cycle does not address Aggregate Gate reset

Task 5.1 subtask 4 says "Verify the full re-review cycle after QA failure: Coding -> 3 Reviewers -> QA (all 3 reviewers must re-vote)". This confirms the reset requirement from M3 P1, but the plan should specify WHO resets the votes (TaskAgentManager? The gate itself on cyclic traversal?).


M6: Human Gate Canvas UI

Strengths

  • Canvas visualization is the right UX for human-in-the-loop
  • Artifacts view with diff rendering matches GitHub PR review UX
  • Chat-based approval as fallback is good for accessibility

Issues

P1: getGateArtifacts implementation complexity

Task 6.1 says the RPC uses git diff and git status in the task worktree. This requires:

  • The worktree to exist (what if it was cleaned up?)
  • The worktree path to be queryable from the gate/run ID

The plan should add a subtask to store worktreePath in the workflow run config or a space_worktrees table lookup.

P2: Concurrent human gate approval

What happens if two humans click Approve simultaneously? The plan says gate data shows { waiting: true } but does not address concurrent approval. Add idempotency key or optimistic locking.


M7: Online Integration Test

Strengths

  • Test file structure is well-organized
  • Shared helpers reduce boilerplate
  • Edge case coverage is comprehensive

Issues

P1: Test 7.4 "Full Pipeline" runs real agents end-to-end

The full pipeline test runs Planning -> Code -> Review -> QA -> Done with real agent execution. This will be slow and potentially flaky. Consider:

  • Adding a "mock agent" mode that simulates completion + gate writes
  • Or splitting into smaller tests that mock gate writes directly

P2: Task 7.5 edge case "Aggregate gate partial + daemon restart" is underspecified

How does the test verify that "3rd vote still counted"? The test needs to:

  1. Write 2 votes
  2. Restart daemon
  3. Write 3rd vote
  4. Verify QA activates

This requires the votes to persist in SQLite across restart, which they should if stored in gate data. Make this explicit.


M8: E2E Test

Strengths

  • Full happy path through canvas UI is the right test
  • Reviewer feedback loop test is critical

Issues

P1: Test 8.1 relies on real workflow execution

Same issue as M7 P1 - this test will be slow. Consider adding a "fast-forward" mode that bypasses agent execution for UI testing.

P2: Task 8.2 "Reviewer rejects" test is timing-sensitive

The test must detect when a reviewer has rejected (visible on canvas). How does it know when to check? Add explicit "wait for Aggregate Gate state" steps.


M9: Bug Fixes and Hardening

Strengths

  • Error handling categories are comprehensive
  • Per-node timeouts are good guardrails

Issues

P2: "Gate data corruption recovery" is underspecified

Task 9.4 says "reset to initial state and log error" but does not define "initial state" for each gate type:

  • PR Gate: {} (no PR URL)?
  • Aggregate Gate: { votes: {}, quorum: 3 }?
  • Human Gate: { waiting: true }?

Add explicit reset schemas per gate type.


Summary

This is a well-structured plan with clear milestones and dependencies. The gate+channel architecture is the right abstraction. The main gaps are:

  1. Gate discovery (P1): Agents need a way to discover gate IDs at runtime
  2. Aggregate Gate reset on cycles (P1): Cyclic traversals must reset vote state
  3. Worktree cleanup timing (P1): Consider keeping worktrees until PR merge
  4. Permission model details (P1): Authorization checks need more specification
  5. Test execution speed (P2): Full pipeline tests will be slow without mock modes

Recommendation: REQUEST_CHANGES - The P1 issues should be addressed before implementation begins.

P1 fixes:
- Use existing 'needs_attention' status (not 'failed') for all failure
  scenarios, with structured failureReason field
- Add list_gates MCP tool and workflow context injection for gate
  discovery
- Implement gate data reset on cyclic channel traversal (Aggregate
  votes cleared, PR data preserved)
- Specify gate write permission model with allowedWriterRoles and
  clear error messages
- Fix worktree cleanup: keep until PR merged (TTL-based reaper),
  immediate cleanup only on cancellation
- Add worktree path lookup for getGateArtifacts RPC
- Specify who resets Aggregate Gate votes (ChannelRouter on cycle)

P2 fixes:
- Pick gate_data table (not JSON column) for persistence
- Separate reject gate from aggregate gate with distinct IDs
- Commit to new SpaceWorktreeManager (not reusing Room's)
- Reorder M2 after M3 (prompts need concrete gate IDs)
- Use nodeId (not agentId) as Aggregate Gate vote key
- Shorten branch naming to space/{worktree-name}
- Add concurrent approval idempotency (optimistic locking)
- Specify aggregate gate partial + restart test steps
- Add wait-for-gate-state steps in E2E tests
- Define gate data corruption reset schemas per type
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM


Verification of Previous Fix Claims

Fix 1: WorkflowRunStatus — uses needs_attention instead of adding failed (P1 from round 1)

VERIFIED — correctly applied across all files.

  • 00-overview.md (lines 69-75): Explicitly states the strategy — uses existing needs_attention with failureReason field.
  • 01-gate-data-store.md (line 58): Task 1.1 subtask 7 adds failureReason to SpaceWorkflowRun.
  • 03-extended-coding-workflow.md (line 57): References needs_attention with failureReason: 'maxIterationsReached'.
  • 05-qa-agent-node.md (line 19): Correctly uses needs_attention in iteration cap context.
  • 06-human-gate-canvas-ui.md (line 61): Correctly uses needs_attention with failureReason: 'humanRejected'.
  • 07-online-integration-test.md (lines 54, 150): Test assertions verify needs_attention with failureReason.
  • 09-bug-fixes-and-hardening.md (line 74): Uses needs_attention with failureReason: 'agentCrash'.

The existing WorkflowRunStatus type (packages/shared/src/types/space.ts:304) is 'pending' | 'in_progress' | 'completed' | 'cancelled' | 'needs_attention' — the plan correctly avoids adding 'failed'.

Fix 2: Gate data persistence — dedicated gate_data table (P2 from round 1)

VERIFIED — consistently applied.

  • 00-overview.md (line 46): Explicitly states the gate_data table keyed by runId + gateId, with four reasons for separating from channel definitions.
  • 01-gate-data-store.md (line 56): Task 1.1 subtask 5 creates a dedicated gate_data table with JSON column, keyed by (run_id, gate_id).
  • 01-gate-data-store.md (line 115): write_gate MCP tool persists to gate_data table.
  • 07-online-integration-test.md (line 152): Edge case test specifically verifies gate data survives daemon restart via gate_data table.

Fix 3: Aggregate Gate reject-vs-approve — separate gate instances (P2 from round 2)

VERIFIED — correctly implemented.

  • 03-extended-coding-workflow.md (lines 30-49): The channel table clearly defines:
    • review-aggregate-gate (Aggregate type, quorum=3) — for tracking approve votes
    • review-reject-gate (Task Result type, expression='rejected') — separate gate for rejection feedback
  • Lines 46-49: Explicit clarification section explaining they are separate instances with different IDs and different evaluation logic.
  • Line 51: Gate data reset on cycles correctly resets both gates.
  • 00-overview.md (line 209): Vote data structure matches — { votes: { [nodeId]: 'approve' | 'reject' } }.

Fix 4: M2 depends on M3 (P2 from round 2)

VERIFIED — correctly applied.

  • 00-overview.md (line 224): "Milestone 2 (prompts) depends on M1 (agents need read_gate/write_gate/list_gates instructions) AND M3 (prompts reference specific gate IDs from the V2 workflow template). M2 should be implemented after M3."
  • 02-enhanced-node-agent-prompts.md (line 7): Dependency note explicitly states M2 depends on both M1 and M3.
  • All tasks in M2 (Tasks 2.1-2.4) list dependencies on M1 and M3.
  • M3 Task 3.1 has no dependency on M2 (correct — M3 only depends on M1).

New Issues Found

P2: GateConfig discriminated union omits the existing condition type

01-gate-data-store.md (lines 27-32) defines GateConfig as:

type GateConfig =
  | { type: 'always' }
  | { type: 'task_result'; expression: string }
  | { type: 'human' }
  | { type: 'pr' }
  | { type: 'aggregate'; quorum: number };

The existing WorkflowConditionType (packages/shared/src/types/space.ts:460) is 'always' | 'human' | 'condition' | 'task_result'. The plan's GateConfig drops the condition type entirely. While no built-in workflow currently uses condition, it is a supported gate type with a working evaluator in ChannelGateEvaluator. The plan should either:

  • (a) Include it in GateConfig to maintain backward compatibility, or
  • (b) Explicitly state that condition gates are deprecated/out of scope for V2.

The plan's Task 1.1 subtask 1 (line 52) says to "audit the existing gate types" including condition, but the resulting GateConfig union silently drops it. This could cause the ChannelGateEvaluator's exhaustive switch to break if existing workflows using condition gates are loaded.

P2: Gate write authorization model has a gap — who writes to the reject gate?

03-extended-coding-workflow.md (line 44) defines review-reject-gate with allowedWriterRoles presumably ['reviewer']. But the overview (00-overview.md line 62) lists allowedWriterRoles only for 5 gates and omits both review-reject-gate and qa-fail-gate. The allowedWriterRoles table in the overview should be expanded to include all gate instances (currently lists only 5, but the V2 workflow has 8 distinct gate instances). Without this, implementers may miss setting write permissions on these gates, allowing unauthorized writes.

P2: Dual-numbered subtask in M5 Task 5.1

05-qa-agent-node.md has two subtasks numbered 6:

  • Line 34: "6. Verify iteration counter increments on QA→Coding cycle"
  • Line 34: "6. Unit tests for QA feedback loop"

This is a minor numbering error that could cause confusion during implementation. The second should be 7.

P3: Plan Review node uses single reviewer but pipeline diagram shows 3 parallel reviewers

03-extended-coding-workflow.md (line 22) says "Plan Review | reviewer | no (single for MVP)", and the overview pipeline (line 12) shows "Plan Review (reviewer agents)" implying multiple. The single reviewer for plan review is a deliberate simplification, but the overview diagram should match. Consider updating the overview pipeline to say "Plan Review (single reviewer)" for clarity.

P3: M5 Task 5.3 (Space Chat Agent task creation) scope creep

Task 5.3 is about ensuring the Space chat agent can create tasks from conversation and start V2 workflows. This feels like it belongs in an earlier milestone or in M6 (human gate UI), since the chat agent is the entry point and should be wired up before the canvas UI. Currently M6 depends on M5, but the chat agent's ability to create tasks and start workflows is a prerequisite for testing anything downstream. This is not blocking — just a sequencing note.


Summary

All four previously reported issues (P1/P2) have been correctly fixed and the fixes are consistently applied across all milestone files. The plan is well-structured and the gate+channel architecture is clearly thought through. The remaining issues are all P2/P3 — no blocking problems.

Recommendation: APPROVE (with the P2 items addressed as follow-up, not blocking).

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Verification of Previous Round Fixes

I have verified that all claimed fixes from round 2 have been applied:

FIXED (P1) - Gate discovery mechanism: list_gates MCP tool is defined in M1 Task 1.3 subtask 1, plus workflow context injection in subtask 5. Both mechanisms documented in overview lines 53-54.

FIXED (P1) - Aggregate Gate reset on cycles: M1 Task 1.4 subtask 5 specifies cyclic channel traversal resets downstream gate data. M3 line 51 and M5 Task 5.1 subtask 5 provide consistent details. Overview line 219 summarizes.

FIXED (P1) - Worktree cleanup timing: M4 Task 4.3 subtasks 4-6 specify TTL-based reaper (7 days), immediate cleanup only on cancellation. Overview line 249 confirms.

FIXED (P1) - Permission model: allowedWriterRoles defined in M1 Task 1.1 subtask 6, M1 Task 1.3 subtask 3 (authorization check), and overview lines 58-65.

FIXED (P1) - WorkflowRunStatus type: Overview lines 69-75 and M1 Task 1.1 subtask 7 correctly specify using existing needs_attention status with failureReason field instead of adding failed.

FIXED (P1) - getGateArtifacts worktree lookup: M6 Task 6.1 subtask 2 specifies querying space_worktrees table with PR URL fallback. Acceptance criteria line 72 confirms.

FIXED (P2) - Gate data persistence: gate_data table specified in M1 Task 1.1 subtask 5, M1 Task 1.3 subtask 3, and overview line 46.

FIXED (P2) - Reject vs Aggregate gates: M3 lines 44-49 clearly distinguish review-aggregate-gate (quorum) from review-reject-gate (reject feedback) as separate gate instances.

FIXED (P2) - M2/M3 ordering: Overview line 224 states "M2 should be implemented after M3". M2 line 7 confirms "Implement M3 before M2".


New Issues Found

P2: Duplicate subtask numbering in M5 Task 5.1

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/05-qa-agent-node.md
Lines: 33-34

Task 5.1 has two subtasks numbered "6":

  • Line 33: 6. Verify iteration counter increments on QA→Coding cycle
  • Line 34: 6. Unit tests for QA feedback loop

This is a minor documentation error that should be fixed for clarity.


P2: Incomplete gate data reset specification in M1 Task 1.4

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md
Lines: 147-150

M1 Task 1.4 subtask 5 specifies three gates to reset:

  • Aggregate Gate votes reset to { votes: {} }
  • Code PR Gate data is preserved
  • Task Result Gate (QA) resets to {}

However, M3 (line 51) and M5 Task 5.1 subtask 5 specify four gates:

  • review-aggregate-gate votes reset
  • qa-result-gate reset
  • review-reject-gate reset <-- Missing from M1
  • code-pr-gate preserved

The review-reject-gate reset is missing from M1 Task 1.4. While the overview (line 219) only mentions "Aggregate Gate's vote data" which is less specific but not incorrect, the M1 task should be updated to include the review-reject-gate reset for consistency with M3 and M5.


Recommendation

APPROVE with minor fixes

All critical P1 issues from the previous round have been correctly addressed. The new issues found (P2: duplicate numbering, P2: incomplete gate reset spec in M1) are documentation consistency issues that do not affect the plan's implementability. The implementer can infer the correct behavior from M3/M5 specifications.

The plan is well-structured with clear dependencies, consistent gate+channel architecture, and comprehensive testing milestones.

- Fix duplicate subtask numbering in M5 Task 5.1 (6→7)
- Add review-reject-gate and qa-fail-gate to M1 Task 1.4 reset spec
- Add existing 'condition' type to GateConfig union in overview
- Add all gate IDs to allowedWriterRoles table (reject/fail gates)
- Clarify Plan Review is 1 reviewer (not 3) in overview diagram
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Verification of 5 P2 Fixes (Round 4)

All 5 fixes were applied correctly. Narrow review of each fix:

Fix 1: Duplicate subtask numbering in M5 Task 5.1

Verified. Subtask 6 renamed to 7 at 05-qa-agent-node.md:34. Numbering now runs 1-7 with no duplicates.

Fix 2: Incomplete gate reset spec in M1 Task 1.4

Verified. 01-gate-data-store.md:148-153 now lists all 5 relevant gates: review-aggregate-gate (reset), review-reject-gate (reset), qa-result-gate (reset), qa-fail-gate (reset), code-pr-gate (preserved). Also changed "same transaction" to "same SQLite transaction" for precision.

Fix 3: Missing condition type in GateConfig union

Verified. 00-overview.md:32 now includes 'condition' in the Gate type union: type: 'pr' | 'human' | 'aggregate' | 'task_result' | 'condition' | 'always'. This matches the existing ChannelGateEvaluator which already supports condition gate type (noted at line 116).

Fix 4: Incomplete allowedWriterRoles table

Verified. 00-overview.md:58-65 now lists all 7 gate IDs with their writer roles: plan-pr-gate, plan-human-gate, code-pr-gate, review-aggregate-gate, review-reject-gate, qa-result-gate, qa-fail-gate. The previously-missing review-reject-gate and qa-fail-gate are now included.

Fix 5: Plan Review reviewer count

Verified. Both the pipeline diagram at 00-overview.md:12 and the workflow graph at line 196 now say "1 reviewer" instead of the ambiguous "reviewer agents" / "reviewers". Consistent throughout.

Minor Observation (P3)

M1 Task 1.4 now lists qa-fail-gate in the reset spec (line 151), but the cross-references to this reset in M3 (03-extended-coding-workflow.md:51) and M5 Task 5.1 (05-qa-agent-node.md:32) do not include qa-fail-gate in their reset lists. This is not a blocking issue since M1 Task 1.4 is the canonical implementation spec, but it is a minor consistency gap in the summary descriptions in M3 and M5.

Conclusion

All 5 P2 fixes are correctly applied. No regressions introduced. The plan is ready for implementation.

Recommendation: APPROVE

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Verification of Round 3 P2 Fixes

I have verified all 5 P2 fixes from round 3 have been correctly applied:

FIXED (P2) - Duplicate subtask numbering in M5 Task 5.1: 05-qa-agent-node.md lines 33-34 now have correct numbering (subtask 6 → "Verify iteration counter...", subtask 7 → "Unit tests...").

FIXED (P2) - Incomplete gate reset spec in M1 Task 1.4: 01-gate-data-store.md lines 147-154 now includes all gates to reset: review-aggregate-gate, review-reject-gate, qa-result-gate, qa-fail-gate, with code-pr-gate preserved.

FIXED (P2) - Missing condition type in Gate type union: 00-overview.md line 32 now includes condition in the Gate type union: type: 'pr' | 'human' | 'aggregate' | 'task_result' | 'condition' | 'always'.

FIXED (P2) - Incomplete allowedWriterRoles table: 00-overview.md lines 58-66 now lists all 7 gate IDs with their roles, including review-reject-gate (reviewer) and qa-fail-gate (qa).

FIXED (P3) - Plan Review count mismatch: 00-overview.md lines 12 and 196 now correctly state "Plan Review (1 reviewer)" in both the pipeline description and workflow graph.


Minor Inconsistency Found (P3 - nit)

M1 Gate type union missing condition: 01-gate-data-store.md line 21 shows type: 'pr' | 'human' | 'aggregate' | 'task_result' | 'always' without condition, while the overview line 32 includes it. This is a minor documentation inconsistency that won't affect implementation since:

  • The overview is the authoritative architecture document
  • The condition type already exists in packages/shared/src/types/space.ts
  • The implementer will reference the actual codebase types

No action required - this is informational only.


Recommendation

APPROVE

All 5 P2 fixes from round 3 have been correctly applied. The plan is consistent, well-structured, and ready for implementation. The one minor inconsistency found (M1 Gate type missing condition) is a P3 nit that doesn't affect implementability.

Replace separate gate types (PRGate, AggregateGate, HumanGate, etc.)
with a single unified Gate entity. Three composable condition types
cover all workflow behaviors:

- `always`: always passes
- `check`: checks a single field (exists, ==, !=) — covers PR URL
  checks, human approval, task results
- `count`: counts matching entries in a map field — covers vote
  counting (reviewer quorum)

Key changes:
- Gate interface: { id, condition, data, allowedWriterRoles,
  description, resetOnCycle }
- One evaluateGate() function with 3-way switch (not per-type classes)
- resetOnCycle flag replaces hardcoded gate-type-specific reset logic
- All agents use same read_gate/write_gate/list_gates tools
- Updated all milestone files to use unified gate language
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Recommendation: APPROVE (with minor P2/P3 cleanup suggestions)

The unified gate design is well-executed and consistent. The three condition types (always, check, count) cleanly cover all workflow behaviors, and the architecture in M1 (00-overview, 01-gate-data-store) is thorough and well-specified. The gate data store separation, resetOnCycle flag, and MCP tool design are solid.

However, there are leftover references to the old gate type terminology in M5, M7, and M8 that should be cleaned up before implementation begins.


P2: Leftover old gate type terminology in M5, M7, M8

The whole point of this revision was to eliminate separate gate type names (PRGate, AggregateGate, HumanGate, TaskResultGate). M1, M2, M3, and M6 are clean, but M5, M7, and M8 still use the old vocabulary:

05-qa-agent-node.md (5 instances):

  • Line 5: "QA sits between the Aggregate Gate (after 3 reviewers approve)"
  • Line 10: 3 Reviewers ──[Aggregate Gate: 3 yes]──► QA ──[Task Result: pass]──► Done
  • Line 12: └──[Task Result: fail, cyclic]──► Coding
  • Lines 10, 12: The ASCII diagram uses "Aggregate Gate" and "Task Result" as gate type labels instead of the unified condition syntax used everywhere else (count: votes.approve >= 3, check: result == passed)

07-online-integration-test.md (7 instances):

  • Line 16: File comment # Coding → PR Gate → 3 Reviewers → Aggregate Gate
  • Line 71: "Test Coding → PR Gate → 3 Reviewers (parallel) → Aggregate Gate"
  • Line 79: "Aggregate Gate stays blocked"
  • Line 85: "Aggregate Gate requires all 3 approvals"
  • Line 152: Uses review-aggregate-gate (wrong gate ID — should be review-votes-gate)

08-e2e-test.md (5 instances):

  • Line 59: "Wait for Aggregate Gate vote display"
  • Line 61: "Wait for Aggregate Gate to show reset"
  • Line 63: "Aggregate Gate shows 3/3"
  • Line 69: "Aggregate Gate vote display works"

06-human-gate-canvas-ui.md (1 instance):

  • Line 95: "shared aggregate gate indicator" — minor, but should be "shared vote-counting gate indicator"

Suggested fix: Replace all "Aggregate Gate" with "vote-counting gate" (or simply reference review-votes-gate by ID). Replace all "Task Result" with the specific condition (check: result == passed/failed). Fix the wrong gate ID review-aggregate-gate to review-votes-gate in M7 Task 7.5.


P2: review-reject-gate merge semantics could silently overwrite rejection feedback

In 01-gate-data-store.md line 128, write_gate uses merge semantics: "new keys added, existing keys updated." The review-reject-gate has allowedWriterRoles: ['reviewer'], and multiple reviewers could write { result: 'rejected', feedback: '...' } to it. With merge semantics, if reviewer-1 writes { result: 'rejected', feedback: 'A' } and reviewer-2 writes { result: 'rejected', feedback: 'B' }, the final data would be { result: 'rejected', feedback: 'B' } — reviewer-1's feedback is silently overwritten.

This is acceptable for MVP (any rejection triggers the cyclic channel immediately), but worth a brief note in the plan acknowledging this limitation.


P3: PR description file list is outdated

The PR body lists 9 plan files (00-overview through 08-bug-fixes), but the actual files are now 00-overview through 09-bug-fixes (10 files, with 01-gate-data-store being new). The file numbering shifted when M1 was replaced. Update the PR body to match the actual file list.


P3: M5 Task 5.1 scope overlaps with M3 Task 3.1

M5 Task 5.1 "Wire QA into V2 Workflow" subtasks 1-6 are largely verification that the V2 template (defined in M3 Task 3.1) is correct. If M3's acceptance criteria already guarantee the V2 template has correct channel/gate wiring, then M5 Task 5.1 is mostly redundant verification work. Consider either strengthening M3's acceptance criteria to include QA wiring assertions (then simplifying M5 Task 5.1), or keeping M5 as-is but acknowledging it is an integration test milestone rather than a new implementation milestone.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Summary

The unified gate design is well-specified and complete. The refactoring successfully consolidates all gate types (PRGate, AggregateGate, HumanGate, TaskResultGate) into a single unified Gate entity with three composable condition types (always, check, count). This is a cleaner, more elegant design.

However, I found several inconsistencies where old terminology leaked through the refactoring:


P1 Issues

1. Inconsistent Gate ID naming in M7 test (line 152)

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/07-online-integration-test.md

Problem: Line 152 references review-aggregate-gate, but the unified gate design names this review-votes-gate. The M3 workflow definition and all other milestones consistently use review-votes-gate.

Quote:

write 2 approve votes to review-aggregate-gate

Fix: Change to review-votes-gate to match M3's channel definition table.


P2 Issues

2. Legacy terminology in diagrams and descriptions (M5)

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/05-qa-agent-node.md

Problem: Lines 5, 10, 12 use old terminology:

  • "Aggregate Gate" (should reference the unified review-votes-gate with count condition)
  • "Task Result: pass/fail" (should be the unified qa-result-gate/qa-fail-gate with check conditions)

Quotes:

QA sits between the Aggregate Gate (after 3 reviewers approve)
3 Reviewers ──[Aggregate Gate: 3 yes]──► QA ──[Task Result: pass]──► Done
└──[Task Result: fail, cyclic]──► Coding

Fix: Update diagrams to use the unified gate terminology:

3 Reviewers ──[check: votes.approve >= 3]──► QA ──[check: result == passed]──► Done
                                          │
                                          └──[check: result == failed, cyclic]──► Coding

3. Legacy terminology in test descriptions (M7)

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/07-online-integration-test.md

Problem: Lines 16, 71, 79, 85 use "Aggregate Gate" instead of describing the unified review-votes-gate with count condition.

Fix: Either:

  • Replace "Aggregate Gate" with "review-votes-gate" or "Vote Gate"
  • Or clarify parenthetically: "Aggregate Gate (review-votes-gate with count condition)"

4. Legacy terminology in E2E test descriptions (M8)

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/08-e2e-test.md

Problem: Lines 59, 61, 63, 69 use "Aggregate Gate" without clarifying it's now a unified gate with count condition.

Fix: Add parenthetical clarification or update to unified terminology.


5. Legacy terminology in canvas UI task (M6)

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/06-human-gate-canvas-ui.md

Problem: Line 95 references "aggregate gate indicator" without clarifying this is now review-votes-gate with count condition.

Fix: Update to "review-votes-gate indicator (showing vote count)".


P3 Issues

6. "Human Gate" terminology is acceptable but could be clarified

The plan uses "human gate" as a descriptive term for gates that require human approval (specifically plan-approval-gate with check: approved == true). This is acceptable as a UX-friendly term, but the plan could note that "human gate" is just a use case of the unified Gate with check condition, not a separate type.

Suggestion: Add a note in M1 or M3 clarifying: "Throughout this plan, 'human gate' refers to a gate with condition { type: 'check', field: 'approved', op: '==', value: true } that is written via RPC rather than MCP tool."


What's Done Well

  1. Unified Gate interface is cleanly defined with clear type signature and field descriptions
  2. Three condition types (always, check, count) are sufficient to express all workflow behaviors
  3. M1 tasks comprehensively cover the implementation (schema, evaluator, MCP tools, channel router integration)
  4. M3 channel definition table is excellent - clearly shows all gates use unified format with condition configs
  5. resetOnCycle flag is well-documented with clear examples of which gates reset and which preserve data
  6. Workflow context injection design ensures agents know their gate IDs without hardcoding

Recommendation

REQUEST_CHANGES due to the P1 inconsistency (wrong gate ID name in M7 line 152). The P2 issues are important for consistency but could be addressed in a follow-up pass. The unified gate design itself is sound.


Gate ID Consistency Check Summary

Gate ID Used in M3 (definition) Used elsewhere
plan-pr-gate Yes Consistent
plan-approval-gate Yes Consistent
code-pr-gate Yes Consistent
review-votes-gate Yes M7 line 152 uses review-aggregate-gate (WRONG)
review-reject-gate Yes Consistent
qa-result-gate Yes Consistent
qa-fail-gate Yes Consistent

Replace all occurrences of legacy gate type names (Aggregate Gate,
Human Gate, PR Gate, Code PR Gate, Plan PR Gate, Task Result) with
unified gate IDs and condition descriptions across M0, M2, M3, M5,
M6, M7, M8, M9.

P1 fix: corrected wrong gate ID `review-aggregate-gate` to
`review-votes-gate` in M7 Task 7.5.
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Recommendation: APPROVE

Both terminology fixes have been verified across all milestone files (M0-M9).

P1 fix verified: review-aggregate-gate -> review-votes-gate

Searched all files for review-aggregate-gate -- zero matches. M7 (07-online-integration-test.md) now consistently uses review-votes-gate at lines 16, 71, 79, 85, and 152. The fix is correct.

P2 fix verified: Legacy gate type names eliminated

Searched all files for "Aggregate Gate", "Human Gate", "PR Gate" (as a class/type name), and "Task Result Gate" across M5, M6, M7, and M8 -- zero matches in those files. All instances have been replaced with unified gate language using specific gate IDs (e.g., review-votes-gate, plan-approval-gate, code-pr-gate) and condition descriptions (e.g., count: votes.approve >= 3, check: result == passed).

Minor note (not an issue)

Two informal references remain in 00-overview.md:

  • Line 63: "PR gate:", "Human approval gate:", "Vote gate:", "QA result gate:" -- These are prose labels in a "Gate data examples" block describing the purpose of each gate's data, not referring to legacy gate type classes. Acceptable as-is.
  • Line 130: "with human gate" -- Describes the existing V1 CODING_WORKFLOW in the "Current State Analysis" section, referring to what already exists in the codebase. Factual description, not prescribing terminology.

Both are appropriate contextual usage and do not conflict with the unified gate terminology.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Terminology Cleanup Verification (Round 6)

I have verified all milestone files (M0-M9) for correct unified gate terminology. The fixes from round 5 have been correctly applied.

Verification Results:

  1. P1 Fix (Wrong gate ID): The incorrect review-aggregate-gate in M7 has been corrected to review-votes-gate throughout. All 9 occurrences in 07-online-integration-test.md now use the correct gate ID.

  2. P2 Fix (Legacy gate type names): All legacy type names have been replaced with unified gate language:

    • "Aggregate Gate" → no longer used (only appears in negative context explaining what NOT to use)
    • "Human Gate" → no longer used as a type name
    • "PR Gate" → no longer used as a type name
    • "Task Result Gate" → no longer used

Current Gate IDs (Consistent Across All Milestones):

  • plan-pr-gate - Planner writes PR URL
  • plan-approval-gate - Human approval
  • code-pr-gate - Coder writes PR URL
  • review-votes-gate - Vote counting (3 reviewers)
  • review-reject-gate - Rejection feedback
  • qa-result-gate - QA pass result
  • qa-fail-gate - QA fail feedback

Legacy References (Correctly Used in Negative Context):

  • M0 (00-overview.md:21) and M1 (01-gate-data-store.md:55) mention PRGate, AggregateGate, HumanGate only to explain that these class-based types are NOT used - the unified Gate entity with composable conditions replaces them. This is correct usage.

Recommendation: APPROVE

All terminology issues from round 5 have been correctly addressed. The plan now uses consistent unified gate language across all 10 milestone files.


Issues Found: P0: 0 | P1: 0 | P2: 0 | P3: 0

… naming

1. Gate conditions: Add `all` (AND) and `any` (OR) composite condition
   types to GateCondition. These are recursive — sub-conditions can be
   any GateCondition including nested all/any. V2 workflow uses simple
   check/count conditions; composites enable future gates like
   "PR exists AND CI passes" without proliferating gates.

2. Worktree naming: Replace random adjective+number names (alpha-3) with
   task-title-derived slugs. The same slug is used for both the worktree
   folder and git branch (prefixed with space/). Example: task "Add dark
   mode support" → folder add-dark-mode-support, branch
   space/add-dark-mode-support. Includes slugification rules, collision
   handling, and empty-title fallback.
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Verdict: APPROVE

The two refinements (composite gate conditions and task-title worktree naming) are well-specified and consistently integrated. The plan is ready for implementation.


Refinement 1: Composite Gate Conditions (all/any)

Well-specified aspects:

  • The GateCondition discriminated union in 00-overview.md (line 39-44) and 01-gate-data-store.md (line 25-30) cleanly defines all five types with correct TypeScript signatures. The conditions field on all/any is typed as GateCondition[], which naturally enables recursive nesting.
  • Empty array semantics are explicitly defined: all with empty conditions returns true (vacuous truth), any with empty conditions returns false (01-gate-data-store.md lines 45-47). Both are standard mathematical conventions and the plan is consistent about this.
  • The evaluator logic in 01-gate-data-store.md Task 1.2 (lines 98-104) correctly maps to Array.every() and Array.some() for recursive evaluation.
  • Task 1.2 subtask 7 (line 110) explicitly calls for unit tests covering nested all/any and empty arrays.
  • M3 Task 3.1 subtask 8 (line 73) correctly notes that V2 uses only check and count conditions, with all/any reserved for future workflows.

One inconsistency (P2):

  • 01-gate-data-store.md line 5 says "three condition types" and line 95 says "handles all three condition types", but the rest of the document (and the overview) correctly says five. These two leftover references from before the composite condition refinement were not updated. Similarly, 00-overview.md line 160 says "the three condition types (always, check, count)". These are cosmetic but could confuse implementers.

Refinement 2: Task-Title Worktree Naming

Well-specified aspects:

  • 04-worktree-isolation.md Task 4.1 (lines 19-41) provides a complete specification of slugifyTaskTitle(title, existingSlugs): lowercase, hyphens for spaces/special chars, collapse consecutive hyphens, strip leading/trailing hyphens, max 60 chars at word boundary, collision suffix (-2, -3), and empty-title fallback to task-{shortId}.
  • The unit test list (line 35) covers the key edge cases: special characters, long titles, collision handling, empty/whitespace-only titles, and unicode handling.
  • The slug is consistently used for both folder name (.worktrees/{slug}/) and branch name (space/{slug}) across M4 Task 4.2 (lines 61, 63).
  • The SpaceWorktreeManager.createTaskWorktree() API (Task 4.2 line 56) returns the slug, enabling downstream consumers to use it.
  • No leftover references to the old random naming scheme (alpha-3, nova-7, adjective+number) exist anywhere in the plan files.

Minor observations (P3):

  • The overview (00-overview.md line 184) still says "short names" in the Phase 1 bullet. While not wrong, "task-title-derived slug names" (used in line 205 and 269) is more precise and consistent. This is purely cosmetic.

No New Issues

  • All downstream milestones (M3, M5, M7, M8, M9) consistently reference the gate architecture without contradicting the composite condition model.
  • The worktree naming is consistently referenced in M6 Task 6.1 (line 52) via the space_worktrees table lookup.
  • Cross-milestone dependencies remain correct.

Summary

The plan is coherent and ready. The only actionable finding is the "three condition types" wording leftover in two places (01-gate-data-store.md lines 5 and 95, 00-overview.md line 160) that should say "five" now that all/any have been added. This is a copy-paste oversight from before the composite condition refinement and is trivially fixable at implementation time, but fixing it in the plan would prevent implementer confusion.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Summary

The plan revisions for composite gate conditions (all/any) and task-title worktree naming are well-specified and complete. No leftover alpha-3/nova-7 references found. However, there is one inconsistency in the condition type count.


P2 Issues

1. Inconsistent "three condition types" vs "five condition types" phrasing

Files:

  • /docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md (lines 5, 95)
  • /docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md (line 160)

Issue: The plan states "three condition types" in multiple places but now defines five (always, check, count, all, any).

Locations:

  • M1 intro: "one Gate concept, three condition types, one set of MCP tools"
  • Task 1.2 description: "handles all three condition types"
  • Overview line 160: "Replace the current per-type evaluator logic with the three condition types (always, check, count)"

Correct phrasing should be: "five condition types (including all/any for composition)" or "three primitive condition types plus two composite types."

Recommendation: Update these phrases for consistency. This is a documentation clarity issue, not a design flaw — the actual type definition and evaluator logic correctly specify five types.


Verified Correctly

  1. Composite gate conditions are well-specified:

    • Type definition: { type: 'all'; conditions: GateCondition[] } and { type: 'any'; conditions: GateCondition[] } — correctly recursive
    • Empty array semantics: all with empty array returns true (vacuous truth), any with empty array returns false — standard logic convention
    • Evaluator logic: conditions.every() for all, conditions.some() for any — correct implementation approach
    • Unit tests for composite conditions specified: nested all/any, mixed pass/fail, empty arrays
  2. Task-title slugification rules are complete:

    • Max 60 chars with word-boundary truncation
    • Collision handling with -2/-3 suffix
    • Empty title fallback: task-{shortId} using first 8 chars of task UUID
    • Unicode handling mentioned in unit tests
  3. Consistent naming across all files:

    • No leftover alpha-3/nova-7/random-adjective references
    • Branch naming consistently space/{slug}
    • Worktree folder consistently .worktrees/{slug}/
  4. Downstream milestones (M3, M5, M6, M7, M8) correctly reference the unified gate architecture without contradicting the all/any addition.


Verdict

APPROVE (with one P2 documentation consistency fix). The plan is well-designed and ready for implementation after the minor phrasing update.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Verified: all three occurrences of "three condition types" have been correctly updated to "five condition types" across:

  • 00-overview.md lines 160, 199
  • 01-gate-data-store.md lines 5, 95

Zero stale references to "three condition types" remain. Fix is correct and complete.

Recommendation: APPROVE

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Verification: "three condition types" → "five condition types" fix

Verified the text fix across all plan documents:

File Line Status
00-overview.md 160 Correctly updated
01-gate-data-store.md 5 Correctly updated
01-gate-data-store.md 95 Correctly updated
  • No remaining "three condition types" references found
  • All "five condition types" references now accurately reflect the inclusion of all/any composite conditions

Recommendation: APPROVE — Trivial text fix correctly applied.

…ured send_message

Major architectural refinements:

1. Channels and gates are now fully separated concepts:
   - Channel = unidirectional pipe (from, to, optional gateId)
   - Gate = independent filter entity attached to a channel
   - A channel without a gate is always open (removes `always` condition type)
   - GateCondition now has 4 types: check, count, all, any

2. Bidirectional communication via two channels:
   - Each direction gets its own channel with its own optional gate
   - Example: planner→reviewer (gated by plan-pr-gate), reviewer→planner (no gate, always open)

3. Canvas visualization updated:
   - Lines between nodes = channels (directional arrows)
   - Gates render ON the channel line (not as separate nodes)
   - "+" button and drag-and-drop to add gates to channels in template mode
   - Removing a gate = making channel always open

4. Structured send_message data:
   - send_message now accepts { text, data? } where data carries structured info
   - Gate data updates can be embedded in message data

5. New list_channels MCP tool alongside list_gates for separate topology discovery

Updated M0 (overview), M1 (gate data store), M3 (workflow template), M6 (canvas UI).
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Round 9 review of the refactored plan. The channel/gate separation, removal of the always condition type, bidirectional-as-two-channels model, structured send_message, and list_channels tool are all well-applied across the plan documents. Below are findings organized by severity.


Verification Results

1. Channel/Gate separation consistency across M0, M1, M3, M6 -- PASS

The separation is consistently applied. Channels are { id, from, to, gateId?, isCyclic? } everywhere. Gates are independent entities with { id, channelId, condition, data, allowedWriterRoles, description, resetOnCycle }. M3 correctly defines gates as separate from channels and references them by gateId. M6 references gate IDs from gate data stores, not from channel definitions.

2. Stale always condition references -- PASS (with minor note)

All always occurrences are in one of three legitimate contexts:

  • Describing the old system being replaced (M0 line 164, M1 line 105, M1 line 114, M1 line 147)
  • Explaining the new design ("a channel without a gate is always open" -- this is natural English, not a condition type reference)
  • M9 line 79: "the reset is always {}" -- natural English, not a condition type

No stale condition-type references remain.

3. Channel Router gateless channel handling -- PASS

M1 Task 1.2 explicitly specifies isChannelOpen(channel): if channel.gateId is absent, return true. M1 Task 1.4 specifies ChannelRouter.deliverMessage() uses isChannelOpen(). The current codebase already handles !channel.gate as open (channel-gate-evaluator.ts line 135-137, channel-router.ts line 281), so this is a natural evolution.

4. Structured send_message -- PASS

The { text: string, data?: Record<string, unknown> } shape is consistently specified in M0 (line 89-92), M1 Task 1.1 (line 112), M1 Task 1.3 (line 185), and M2 (line 22). The data field is extensible and well-described with concrete examples (PR URLs, gate data updates, review metadata).

5. Canvas interaction model -- PASS

M6 Task 6.2 clearly specifies gates as visual elements ON channel lines ("like a valve on a pipe"), with a "+" button for adding gates in template mode and drag-from-palette support. The three visual element types (nodes, channels, gates) are clearly distinguished.

6. New issues from refactoring -- see findings below.


Findings

P2: Migration path for condition (shell expression) and task_result gate types is unspecified

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md, line 114

The migration subtask says: "map old always type to channel-without-gate, map old human type to gate with { type: 'check', field: 'approved', op: '==', value: true }, etc."

The "etc." is vague. The current codebase has four gate types: always, human, condition (shell expression evaluated via sh -c), and task_result (prefix match on task result string). The plan's four condition types (check, count, all, any) cover always (channel without gate), human (check field), and the V2 workflow use cases. But there is no equivalent for:

  • condition (arbitrary shell expressions): There is no type: 'shell' or type: 'exec' condition in the new design. If existing workflows use shell-expression gates, they cannot be migrated.
  • task_result (prefix match): The check type with op: '==' does exact equality, not prefix matching. If existing workflows use task_result with a prefix like 'passed' to match 'passed: all tests green', the migration would need a different op or a new op variant.

Recommendation: Either (a) explicitly state that condition and task_result gates are deprecated and existing workflows using them must be manually recreated, or (b) add a type: 'exec' condition (for shell expressions) and extend check with an op: 'startsWith' option (for task_result migration). Given the plan says "V1->V2 migration is out of scope" (M0 line 288), option (a) with a brief note is probably sufficient, but the "etc." should be expanded to be explicit about what is NOT migrated.

P2: channelId on Gate creates a 1:1 constraint that may not hold for shared gates

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md, line 42; 01-gate-data-store.md, line 28

The Gate interface has channelId: string (singular), implying a gate is attached to exactly one channel. However, M3 explicitly defines shared gates: code-pr-gate is shared across three channels (ch-coding-to-rev1, ch-coding-to-rev2, ch-coding-to-rev3), and review-votes-gate is shared across three channels (ch-rev1-to-qa, ch-rev2-to-qa, ch-rev3-to-qa).

If a gate has a single channelId, it can only be attached to one channel. To share a gate across multiple channels, either:

  • The gate needs channelIds: string[] (plural), or
  • Each channel gets its own gate instance with the same condition (but then writes to one gate's data don't affect the others), or
  • The channelId is removed from Gate and the relationship is inverted (channels reference gates via gateId, which the plan already has).

Recommendation: Remove channelId from the Gate interface. The relationship should be channel -> gate (via channel.gateId), not gate -> channel. The gate is an independent entity; channels reference it. This aligns with the plan's own architecture where channels are the pipes and gates are optional attachments. The current Channel interface already has gateId?: string which is the correct direction for the relationship.

P2: Missing specification for how send_message data triggers gate data updates

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/01-gate-data-store.md, line 185

M1 Task 1.3 subtask 5 says: "if data contains gate-targeted updates, they are applied to the appropriate gate's data store."

This is underspecified. Questions left unanswered:

  • What is the key structure for gate-targeted updates in data? The example in M0 (line 96-98) shows data: { gateData: { 'review-votes-gate': { votes: { approve: 1 } } } } -- is this the canonical format?
  • Does the channel router apply these updates before or after delivery? If before, the gate might re-evaluate and unblock before the message is delivered. If after, there's a timing question.
  • Who is authorized to write gate data via send_message? Is the allowedWriterRoles check applied? If so, which role is used -- the sender's role?

Recommendation: Specify the exact data key convention for gate updates (e.g., data.gateData), clarify the ordering (apply updates then deliver), and confirm that allowedWriterRoles is enforced using the sending agent's role.

P3: review-reject-gate shares a channelId ambiguity

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/03-extended-coding-workflow.md, line 44

The channel ch-rev-to-coding (Reviewers -> Coding, cyclic) is listed with review-reject-gate. But there are 3 reviewer nodes. Is this one channel (from all reviewers to coding) or three channels (one per reviewer)? The M3 channel definitions table lists it as a single entry, but the channel addressing format uses agent names (e.g., ch-coding-to-rev1). If it's one channel, the from field would need to be an array or wildcard; if three, the table should list three entries.

Recommendation: Clarify whether ch-rev-to-coding is one channel with wildcard from or three separate channels (one per reviewer). This affects how review-reject-gate data is structured (does a single reject from any reviewer trigger it?).

P3: Gate.data field duplication between interface and gate_data table

File: docs/plans/space-feature-end-to-end-happy-path-single-space-single-task/00-overview.md, line 44; 01-gate-data-store.md, line 31

The Gate interface includes data: Record<string, unknown> and M1 Task 1.1 creates a separate gate_data SQLite table keyed by (run_id, gate_id). The interface data field is the gate definition (per-workflow template), while the gate_data table stores per-run data. However, the Gate interface is not clear about whether data is the initial/default value or is unused at the definition level.

Recommendation: Remove data from the Gate interface (it lives only in the gate_data table) or rename it to initialData / defaultData to clarify that it's a template default, not the runtime state.


Summary

The refactoring is well-executed. The channel/gate separation is clean and consistent. The four condition types (check, count, all, any) plus gateless channels cover all V2 workflow needs. The main issues are: (1) the channelId field on Gate conflicts with the shared-gate pattern used in M3, and (2) the migration "etc." needs to explicitly address condition and task_result gate types. Both are design-level clarifications that should be resolved before implementation begins.

Recommendation: REQUEST_CHANGES (2 P2 issues, 2 P3 issues)

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Summary

The architectural refinements have been consistently applied across all milestones. The Channel/Gate separation is clean, the four condition types (check, count, all, any) are well-specified, and the always condition type has been properly removed (channels without gates are implicitly always open).

Verification Results

Channel/Gate Separation (M0, M1, M3, M6): Consistent across all files.

  • Channels: { id, from, to, gateId?, isCyclic? } - simple pipes
  • Gates: { id, channelId, condition, data, allowedWriterRoles, description, resetOnCycle } - optional filters

No Stale always References: Clean. All mentions are in appropriate contexts:

  • M0 line 164: Describing existing implementation to be replaced ("What Already Exists" section)
  • M0 line 184: Explaining the change being made
  • M1 line 37, 82: Code comments explaining the design

Four Condition Types: Consistently defined in M0 and M1:

  • check: Single field evaluation with operators
  • count: Map field value counting
  • all: AND composition (recursive)
  • any: OR composition (recursive)

Bidirectional = Two Channels: Clearly documented in M0 (lines 27, 146) and M1 (line 47).

Structured send_message: Well-specified in M0 (lines 88-98) and M1 Task 1.3 subtask 5:

{ text: string, data?: Record<string, unknown> }

list_channels MCP Tool: Properly defined in M1 Task 1.3 subtask 1.

Canvas Visualization (M6):

  • Gates rendered ON channel lines (line 14: "like a valve on a pipe")
  • "+" button to add gates (line 20, 102)
  • Drag-and-drop from palette (line 20, 103)
  • Gateless channels render as plain arrows (line 15)

Channel Router Logic: M1 Task 1.2 subtask 2 and Task 1.4 subtask 1 specify isChannelOpen() returns true for channels without gates.

Issues Found

P2 - M6 Task 6.2: Template mode switching undefined
Location: /docs/plans/.../06-human-gate-canvas-ui.md line 101
Subtask 6 mentions gate editing "When viewing a workflow template (not a running instance)" but doesn't specify the UI mechanism for switching between instance view and template view. Consider adding a subtask or acceptance criterion that defines this interaction (e.g., a toggle, tab, or separate route).

P3 - M0: Bidirectional example may confuse
Location: /docs/plans/.../00-overview.md lines 51-53
The bidirectional example shows reviewer → planner feedback channel without gate, but V2 workflow doesn't include this channel (it uses cyclic channels back to Coding instead). Consider adding a clarifying note that this is a conceptual example, not the V2 workflow topology.

Recommendation

APPROVE with minor P2/P3 suggestions. The plan is ready for implementation. The architectural consistency is excellent, and the refinements have been applied thoroughly.

… switching

P2 fixes from reviewer feedback:

1. Remove channelId from Gate interface — gates are independent entities
   referenced by channels via gateId. A single gate can be shared by
   multiple channels (e.g., code-pr-gate shared by 3 reviewer channels).
   The relationship is channels→gate, not gate→channel.

2. Specify migration approach for all existing gate types:
   always → channel without gate, human → check approved,
   condition (shell expr) → check result, task_result → check result.
   Includes in-flight run migration strategy.

3. Define template mode vs runtime mode in M6 canvas:
   mode determined automatically by context (active run → runtime/read-only,
   no active run → template/editable). No explicit toggle needed.
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

Summary

All 3 P2 issues from round 9 have been correctly addressed. The plan is now consistent and ready for implementation.

Fix Verification

Fix 1: Removed channelId from Gate interface - VERIFIED

The Gate interface in M1 (01-gate-data-store.md lines 27-34) correctly contains no channelId. Gates are independent entities referenced by channels via gateId. The only channelId occurrence in the plan is in list_channels return value (M1 Task 1.3), which is the channel's own ID, not a back-reference from Gate.

Fix 2: Specified migration paths for all 4 existing gate types - VERIFIED

M1 Task 1.1 subtask 10 (lines 114-119) explicitly defines migration paths for all 4 existing gate types:

  • always -> remove gate, set channel's gateId to undefined
  • human -> gate with { type: 'check', field: 'approved', op: '==', value: true }
  • condition -> gate with { type: 'check', field: 'result', op: '==', value: 'passed' }
  • task_result -> gate with { type: 'check', field: 'result', op: '==', value: 'passed' }

Fix 3: Defined template/runtime mode switching - VERIFIED

M6 Task 6.2 subtask 6 (lines 101-104) clearly defines automatic mode switching:

  • Runtime mode when active workflow run exists
  • Template mode when no active run and user is in workflow editor
  • No explicit toggle button - context determines mode

Acceptance criteria (lines 119-121) also verify this behavior.

Consistency Check

All milestone files consistently reference:

  • Four condition types (check, count, all, any) - not three
  • Channels as simple pipes with optional gateId reference
  • Gates as independent entities with no back-reference to channels
  • Automatic mode switching based on context

Recommendation

APPROVE - The plan is well-structured, internally consistent, and ready for implementation. All requested fixes have been correctly applied.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Verification of 3 P2 Fixes (Round 10)

All three fixes have been verified across all 10 milestone files. No inconsistencies found.

Fix 1: Removed channelId from Gate interface -- VERIFIED

The Gate interface in both 00-overview.md (line 41) and 01-gate-data-store.md (line 27) contains no channelId field. The interface has exactly 6 fields: id, condition, data, allowedWriterRoles, description, resetOnCycle. The single occurrence of channelId in the plan is on 01-gate-data-store.md line 177 in the list_channels return value -- this is the channel's own id field, which is correct.

The shared-gate concept is consistently described across all files:

  • 00-overview.md line 40: "A single gate can be shared by multiple channels"
  • 01-gate-data-store.md lines 11, 26, 107: multiple references to shared gates
  • 03-extended-coding-workflow.md lines 36-41, 48, 88: explicit shared-gate annotations in channel/gate tables
  • M1 Task 1.4 (line 219) correctly describes shared gate behavior for code-pr-gate and review-votes-gate

The "No back-reference to channel" language appears in both 01-gate-data-store.md (lines 26, 107) and 03-extended-coding-workflow.md (lines 48, 88), consistently reinforcing the channels-reference-gates direction.

Fix 2: Migration paths for all 4 existing gate types -- VERIFIED

01-gate-data-store.md M1 Task 1.1 subtask 10 (lines 114-119) specifies migration paths for all 4 types:

  • always -- remove gate, set gateId to undefined (channel without gate = always open)
  • human -- gate with { type: 'check', field: 'approved', op: '==', value: true }
  • condition (shell expression) -- gate with { type: 'check', field: 'result', op: '==', value: 'passed' } with explanation of paradigm shift
  • task_result -- gate with { type: 'check', field: 'result', op: '==', value: 'passed' } with note that it shares the same mapping as condition

The subtask also includes backward compatibility guidance: "Ensure backward compatibility: existing workflow runs in progress at migration time should continue to work. For in-flight runs, populate gate_data table from the old gate state." This is a good addition that addresses the practical migration concern.

Fix 3: Template/runtime mode switching in M6 Task 6.2 -- VERIFIED

06-human-gate-canvas-ui.md Task 6.2 subtask 6 (lines 101-104) defines two modes:

  • Runtime mode (active when a workflow run exists): read-only visualization
  • Template mode (active when viewing/editing a workflow template with no active run): editable

The mode is determined automatically by context with no explicit toggle: "if the Space has an active workflow run -> runtime mode; if no active run and the user navigates to the workflow editor -> template mode." The acceptance criteria (line 120) reiterates: "Mode determined automatically by context (active run -> runtime, no run -> template)."

Additional Consistency Checks

  • Condition type count: All files consistently say "four condition types" (check, count, all, any). No stale "three" or "five" references remain.
  • Section naming: "Channels + Gates" ordering used consistently (old "Gates + Channels" fully corrected).
  • Old gate type references: Only appear in appropriate migration/audit contexts (M1 Task 1.1), not as current design references.
  • list_gates return value (01-gate-data-store.md line 181): returns { gateId, condition, description, allowedWriterRoles, currentData } -- no channelId field, confirming gates are independent.

Conclusion

All three P2 fixes are correctly implemented and consistent across all milestone files. No issues found.

Recommendation: APPROVE

Two foundational data model enhancements:

1. Numeric task IDs (M1 Task 1.5): Replace UUID-based task
   identification with space-scoped auto-incrementing numeric IDs.
   Adds task_number column with UNIQUE(space_id, task_number).
   UUID stays as internal PK for FK references. Human-friendly:
   "task #5" instead of UUIDs.

2. Space slugs (M1 Task 1.6): Add slug column to spaces for
   human-readable navigation. Auto-generated from name, editable.
   Enables /space/neokai-dev instead of /space/{uuid}.

Together they create a GitHub-style reference hierarchy:
  neokai-dev       → space slug
  neokai-dev#5     → task reference
  space/add-dark-mode-support → worktree branch

Updated M0 (overview with new architecture section), M1 (two new
tasks), M4 (shared slugify utility, numeric task fallback).
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic


Summary

The plan for Tasks 1.5 (Numeric Task IDs) and 1.6 (Space Slugs) is well-designed with solid implementation approaches. However, I identified a few issues that should be addressed before implementation.


P1 Issues (Should Fix)

1. Circular Dependency Between M1 Task 1.6 and M4 Task 4.1

File: 01-gate-data-store.md (Task 1.6, line 281) and 04-worktree-isolation.md (Task 4.1, lines 21, 25)

Task 1.6 says:

"Reuse the slugifyTaskTitle() function from M4 Task 4.1 (or extract into a shared slugify() utility...)"

But Task 4.1 says:

"Create a shared slugification utility that converts names/titles into valid slug strings. Used by worktree naming (task title → folder/branch), space slugs (M1 Task 1.6)..."

This creates a confusing circular reference. M1 Task 1.6 says it depends on M4 Task 4.1, but both tasks have Depends on: nothing and are marked as parallel.

Recommendation: Either:

  • (A) Make Task 4.1 the canonical source of the slugify() utility and have Task 1.6 explicitly depend on Task 4.1
  • (B) Move the slugify() utility definition to Task 1.6 and have Task 4.1 depend on Task 1.6
  • (C) Create a new shared task in M1 that defines slugify(), which both Task 1.6 and Task 4.1 depend on

Option (B) is cleanest since M1 is the foundation milestone.


P2 Issues (Important Suggestions)

2. Task Count Inaccurate

File: 00-overview.md (line 355)

The overview states:

~32 tasks across 9 milestones

Actual count is 37 tasks:

  • M1: 6 tasks
  • M2: 4 tasks
  • M3: 4 tasks
  • M4: 4 tasks
  • M5: 3 tasks
  • M6: 4 tasks
  • M7: 5 tasks
  • M8: 3 tasks
  • M9: 4 tasks

Recommendation: Update to "~37 tasks across 9 milestones" or just "37 tasks across 9 milestones".


3. Subtask 1 in Task 1.5 Conflicts with Subtask 2

File: 01-gate-data-store.md (Task 1.5, lines 251-252)

Subtask 1 says:

"change id column from UUID TEXT to INTEGER PRIMARY KEY AUTOINCREMENT"

Subtask 2 says:

"Implementation approach: Add a task_number column... The UUID id column stays as the internal primary key"

These contradict each other. The implementation approach in subtask 2 is correct, but subtask 1 is misleading.

Recommendation: Remove or revise subtask 1 to align with the implementation approach in subtask 2. For example:

"Add task_number column (INTEGER NOT NULL) to the space_tasks table with UNIQUE(space_id, task_number) constraint."


4. Missing Contiguity Edge Case for Task 1.5

File: 01-gate-data-store.md (Task 1.5)

The acceptance criteria states:

"Numeric IDs are contiguous within a space (1, 2, 3, ...)"

However, the implementation uses COALESCE(MAX(task_number), 0) + 1, which will NOT produce contiguous IDs if a task is deleted. For example: create tasks 1, 2, 3 → delete task 2 → create new task → task 4 (not task 2).

This is actually the correct behavior (matches GitHub issue numbering), but the acceptance criteria is misleading.

Recommendation: Change acceptance criteria to:

"Numeric IDs are monotonically increasing within a space (gaps may occur on deletion, like GitHub issues)"

Or clarify that contiguity is not guaranteed after deletions.


P3 Issues (Nits)

5. Reference to slugifyTaskTitle() Should Be slugify()

File: 01-gate-data-store.md (Task 1.6, line 281)

The task references slugifyTaskTitle() from M4, but Task 4.1 defines it as slugify(). While the parenthetical clarifies this, it's slightly confusing.

Recommendation: Just reference slugify() directly since that's what Task 4.1 defines.


What Looks Good

  1. Task 1.5 implementation approach: Keeping UUID as internal PK while adding task_number for human-facing ID is the right call — avoids risky FK migrations.

  2. Transaction safety: The SELECT COALESCE(MAX...) + 1 approach with transaction is correctly specified.

  3. Space slug spec: Complete with generation, editing, uniqueness validation, backfill migration, and frontend routing.

  4. Reference hierarchy: The neokai-dev#5 pattern is well-defined and consistent throughout.

  5. Cross-references to M4: Task 4.1 correctly references Task 1.5 for the task-{taskNumber} fallback.


Recommendation

REQUEST_CHANGES due to the P1 circular dependency issue between M1 Task 1.6 and M4 Task 4.1. The dependency direction needs to be clarified before implementation begins.

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Round 11 -- focused on the two new M1 foundational features (Task 1.5 Numeric Task IDs, Task 1.6 Space Slugs) and overall plan consistency.

P1: Overview SpaceTask interface contradicts Task 1.5 implementation approach

The overview (00-overview.md line 149-154) shows id: number as the primary key:

interface SpaceTask {
  id: number;              // INTEGER PRIMARY KEY AUTOINCREMENT
  spaceId: string;

But Task 1.5 subtask 2 in 01-gate-data-store.md (line 252) explicitly chooses a different approach: keep UUID id as the internal PK and add a separate task_number column. This is the safer design -- avoids migrating all FK references. The overview code snippet needs to be updated to match, e.g.:

interface SpaceTask {
  id: string;              // UUID -- primary key (unchanged)
  taskNumber: number;      // INTEGER NOT NULL -- human-facing, auto-increment, UNIQUE(space_id, task_number)
  spaceId: string;

This contradiction would confuse an implementer who reads the overview first.

File: docs/plans/.../00-overview.md lines 147-154
File: docs/plans/.../01-gate-data-store.md line 252

P1: Contiguous task numbers is incompatible with task deletion

Task 1.5 acceptance criteria (line 263) states: "Numeric IDs are contiguous within a space (1, 2, 3, ...)". But the COALESCE(MAX(task_number), 0) + 1 approach (subtask 3, line 253) produces gaps when tasks are deleted (e.g., delete task 3, next task becomes 4, not 3).

This is fine for a GitHub-style model (GitHub issue numbers also have gaps), but the "contiguous" claim should be removed from the acceptance criteria. Alternatively, if contiguous numbering is truly required, you would need to renumber on deletion, which is expensive and error-prone. The current approach is the right one -- just remove the misleading claim.

File: docs/plans/.../01-gate-data-store.md line 263

P2: M4 Task 4.1 fallback references M1 Task 1.5 but claims no dependency

Task 4.1 subtask 2 (line 33 in 04-worktree-isolation.md) says: "fall back to task-{taskNumber} (the numeric task ID from M1 Task 1.5)". But Task 4.1's dependency is listed as "nothing" (line 43). This creates an implicit dependency -- if M4 is implemented before M1.5 completes, the fallback cannot use taskNumber.

Since both M1.5 and M4.1 have "Depends on: nothing", this is mostly a documentation issue. But the plan should either: (a) acknowledge the implicit dependency and make M4.1 depend on M1.5, or (b) change the fallback to use the UUID (e.g., task-{uuid.slice(0,8)}) when taskNumber is not yet available.

File: docs/plans/.../04-worktree-isolation.md line 33, 43

P2: PR description file list is stale (8 milestones, old filenames)

The PR body lists 9 files with old milestone numbering (M1=enhanced-node-agent-prompts, M3=node-agent-session-factory, M5=human-gate-and-completion) reflecting the previous 8-milestone structure. The actual files on the branch follow the new 9-milestone numbering where:

  • M1 is now 01-gate-data-store.md (was M1+parts of old M2)
  • M3 is now 03-extended-coding-workflow.md (was M2)
  • 03-node-agent-session-factory.md no longer exists
  • New file 09-bug-fixes-and-hardening.md is not in the PR body

The PR body should be updated to match the actual files. Also the title says "8-milestone plan" which should be "9 milestones".

File: PR description body and title

P2: Task count in overview is inaccurate (~32 stated, 37 actual)

The overview (line 355) says "~32 tasks across 9 milestones". Actual count from all milestone files: 37 tasks (M1=6, M2=4, M3=4, M4=4, M5=3, M6=4, M7=5, M8=3, M9=4). Update to "~37 tasks".

File: docs/plans/.../00-overview.md line 355

P2: Overview cross-reference line still says "Cross-space refs use spaceId + taskId"

Line 158 says "Cross-space refs use spaceId + taskId". With the task_number approach (Task 1.5 subtask 2), the cross-space reference should use spaceId + taskNumber, not spaceId + taskId (which would be the UUID). This should be clarified to match the actual implementation.

File: docs/plans/.../00-overview.md line 158

P3: M1 Task 1.6 references slugifyTaskTitle() which does not exist

Task 1.6 subtask 2 (line 281) says "Reuse the slugifyTaskTitle() function from M4 Task 4.1". But M4 Task 4.1 defines slugify(input: string, existingSlugs: string[]): string -- the function is called slugify, not slugifyTaskTitle(). Minor naming mismatch.

File: docs/plans/.../01-gate-data-store.md line 281

P3: Space slug uniqueness validation detail missing

Task 1.6 subtask 4 mentions updateSlug validates uniqueness but does not specify the error behavior. Should clarify: does it return an error string? Throw? Append a suffix? The auto-generation uses suffix collision resolution, but manual editing should probably reject (not auto-append) to respect user intent.

File: docs/plans/.../01-gate-data-store.md line 283

No stale UUID references in downstream milestones

Verified that downstream milestones (M3-M9) consistently use UUID taskId for internal references (worktree mapping, gate data keys, etc.) which is correct -- the UUID stays as the internal PK per Task 1.5's design. No stale UUID-for-task-ID issues found.

Task 1.5 and 1.6 are well-specified overall

The implementation approach for both features is sound:

  • Task 1.5: Separate task_number column alongside UUID PK is the right call -- avoids cascading FK migration
  • Task 1.6: Shared slugify() utility used by both space slugs and worktree slugs is clean
  • Auto-increment via COALESCE(MAX, 0) + 1 in a transaction is correct for SQLite (no AUTOINCREMENT per-partition)
  • Backfill migration with collision handling for existing spaces is accounted for
  • Reference hierarchy (neokai-dev#5) is well-defined and consistently referenced across milestones

- Fix circular dependency: move slugify utility ownership to M1 Task 1.6,
  M4 Task 4.1 now wraps it with worktree-specific fallback behavior
- Fix SpaceTask interface: show id (UUID) + taskNumber (number), not id: number
- Fix "contiguous" claim to "monotonically increasing" (MAX+1 leaves gaps)
- Fix task count: ~32 → ~37 (actual count after adding Tasks 1.5/1.6)
- Add explicit dependency M4.1 → M1.5 (worktree fallback uses taskNumber)
- Update PR description to reflect 9-milestone structure with correct file names
Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Review by glm-5-turbo (GLM)

Model: glm-5-turbo | Client: NeoKai | Provider: GLM

Round 12 Verification

Verified all six fixes from round 11 are correctly implemented and internally consistent.

P1 Fixes

1. Circular dependency M1.6 <-> M4.1 -- Resolved. M1 Task 1.6 (01-gate-data-store.md:281-286) owns the shared slugify() utility in packages/daemon/src/lib/space/slug.ts with explicit note: "This utility is reused by M4 Task 4.1 (worktree naming) and any future slug needs." M4 Task 4.1 (04-worktree-isolation.md:19-36) now imports from it and depends on M1 Task 1.6. No circular dependency remains.

2. SpaceTask interface contradiction -- Resolved. The interface (00-overview.md:148-156) now clearly defines id: string (UUID, internal PK for FK references) and taskNumber: number (space-scoped, human-facing). M1 Task 1.5 (01-gate-data-store.md:252) reinforces this: "The UUID id column stays as the internal primary key for foreign key references...while task_number is the human-facing numeric ID." No contradiction.

3. "Contiguous" claim -- Resolved. All references now use "monotonically increasing" with explicit gap notes. 00-overview.md:160: "Numbers are NOT contiguous -- gaps may appear when tasks are deleted (e.g., 1, 2, 5 if tasks 3 and 4 were deleted). Like GitHub issue numbers." M1 Task 1.5 acceptance criteria (01-gate-data-store.md:262-263): "gaps may appear on deletion, like GitHub issue numbers." A grep for "contiguous" returns zero stale references.

P2 Fixes

4. Task count -- Verified. Actual task count across all 9 milestones: M1(6) + M2(4) + M3(4) + M4(4) + M5(3) + M6(4) + M7(5) + M8(3) + M9(4) = 37. Matches the claimed "~37 tasks" in 00-overview.md:356 and the PR description "9-milestone plan (37 tasks)."

5. PR description -- Verified. PR description correctly states "9-milestone plan (37 tasks)" and lists all 9 plan files (00-overview through 09-bug-fixes-and-hardening).

6. Explicit dependency M4.1 -> M1.5 -- Verified. M4 Task 4.1 (04-worktree-isolation.md:36): "Depends on: M1 Task 1.5 (numeric task IDs for fallback), M1 Task 1.6 (shared slugify utility)." The M1.5 dependency is needed for the task-{taskNumber} fallback slug when the task title is empty.

Additional Consistency Check

Verified no stale references remain: grep for "three condition types" returns zero hits; all references consistently say "four condition types" (check, count, all, any). The commit 02e948a addressed this.

Verdict

All six fixes are correct and internally consistent. No remaining issues found.

Recommendation: APPROVE

Copy link
Copy Markdown
Owner Author

@lsm lsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by claude-opus-4-6 (Anthropic)

Model: claude-opus-4-6 | Client: NeoKai | Provider: Anthropic

All 6 fixes from round 11 have been correctly applied and are internally consistent.

Verified Fixes

  1. P1: Circular dependency M1.6 ↔ M4.1 - FIXED. M1 Task 1.6 now owns slugify() in ./slug.ts, and M4 Task 4.1 explicitly imports from it (lines 21, 24 of 04-worktree-isolation.md).

  2. P1: Contradictory SpaceTask interface - FIXED. Now correctly shows id: string (UUID primary key) + taskNumber: number (human-facing numeric ID) in 00-overview.md lines 150-151.

  3. P1: "Contiguous" claim - FIXED. Changed to "Monotonically increasing" with explicit gap note: "Numbers are NOT contiguous — gaps may appear when tasks are deleted (e.g., 1, 2, 5 if tasks 3 and 4 were deleted). Like GitHub issue numbers." (00-overview.md line 160).

  4. P2: Task count - FIXED. Shows "~37 tasks across 9 milestones" (00-overview.md line 356).

  5. P2: PR description - FIXED. Mentions "9-milestone plan (37 tasks)" in the summary.

  6. P2: Explicit M4.1 → M1.5 dependency - FIXED. Task 4.1 now explicitly states: "Depends on: M1 Task 1.5 (numeric task IDs for fallback), M1 Task 1.6 (shared slugify utility)" (04-worktree-isolation.md line 36).

Summary

All requested changes have been properly implemented. The plan is internally consistent and ready for implementation.

Recommendation: APPROVE

@lsm lsm merged commit 55da4d6 into dev Mar 27, 2026
64 of 66 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant